AI时代的API设计:设计让AI好用的RESTful接口
AI时代的API设计:设计让AI好用的RESTful接口
从"2天集成"到"2小时集成"的故事
2025年9月,王芳盯着Slack消息发呆。
她是某SaaS公司AI平台团队的Tech Lead,团队花了3个月开发了一套AI文档分析API——上传文档、异步处理、查询结果、获取AI摘要。功能强大,内部测试完美。
然后他们把API开放给外部合作伙伴。
第一个合作伙伴的前端工程师小刘,整合用了2天,还发来了一长串投诉:
- "你们的流式接口返回格式一会儿是JSON一会儿是纯文本,我根本不知道怎么解析"
- "任务轮询接口没有告诉我多久查一次,我只好每秒查一次,你们服务直接被我打爆了"
- "错误码全是200,我怎么知道是成功了还是失败了?"
- "API文档写的是英文参数名,代码里用的是驼峰,文档里用的是下划线,到底哪个对?"
王芳团队开会复盘,发现问题的根源:他们的API是"内部API思维"设计的——只要自己人看得懂就行,从没考虑外部开发者的集成体验。
团队花了2周重新设计API,按照本文介绍的规范进行了彻底重构。第二个合作伙伴,2小时完成集成,零投诉。
这就是好API设计的价值。
一、AI API设计的特殊挑战
1.1 传统REST API vs AI API的核心差异
| 维度 | 传统REST API | AI API |
|---|---|---|
| 响应时间 | 毫秒级,同步 | 毫秒~分钟,差异巨大 |
| 响应内容 | 结构化JSON | 非结构化文本,流式 |
| 幂等性 | 通常可重试 | 每次调用可能不同(非确定性) |
| 成本 | 固定 | 按Token计费,难预估 |
| 进度反馈 | 无需中间状态 | 需要实时进度 |
| 错误类型 | 系统错误为主 | 增加内容安全/Token超限等AI特有错误 |
1.2 AI API的三大典型场景
二、流式响应API设计:SSE规范
2.1 SSE协议基础
Server-Sent Events(SSE)是AI流式响应的最佳选择,相比WebSocket:
- 单向推送(服务器 → 客户端),符合AI生成场景
- 基于HTTP,穿透防火墙,无需握手协议
- 自动重连机制
- 所有现代浏览器原生支持
SSE消息格式规范:
# 标准SSE格式
id: <事件ID,用于断点续传>
event: <事件类型,可选>
data: <数据,必须是一行JSON>
retry: <重连延迟毫秒,可选>
\n\n # 空行表示消息结束2.2 AI流式事件类型设计
# 流式生成过程中的数据块
event: chunk
data: {"type":"chunk","content":"Java虚拟线程","index":0,"model":"gpt-4o"}
# 生成完成
event: done
data: {"type":"done","usage":{"input_tokens":245,"output_tokens":892,"total_tokens":1137},"finish_reason":"stop"}
# 错误事件
event: error
data: {"type":"error","code":"CONTENT_FILTER_TRIGGERED","message":"内容被安全策略过滤","request_id":"req_abc123"}
# 心跳(防止连接超时)
event: ping
data: {"type":"ping","timestamp":1726192800000}2.3 完整Spring Boot SSE Controller
// AiStreamController.java
package com.laozhang.api.controller;
import com.laozhang.api.model.*;
import com.laozhang.api.service.AiStreamService;
import com.laozhang.api.validation.ValidRequest;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.media.Content;
import io.swagger.v3.oas.annotations.media.Schema;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.MediaType;
import org.springframework.http.codec.ServerSentEvent;
import org.springframework.validation.annotation.Validated;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Flux;
import java.time.Duration;
import java.util.UUID;
@RestController
@RequestMapping("/api/v2/ai")
@Tag(name = "AI生成接口", description = "支持流式和非流式的AI内容生成")
@Validated
@Slf4j
@RequiredArgsConstructor
public class AiStreamController {
private final AiStreamService aiStreamService;
/**
* 流式文本生成接口(SSE)
*
* 适用场景:实时对话、代码补全、短文本生成(预期3秒内完成)
* 响应格式:Server-Sent Events
*
* 事件类型:
* - chunk: 文本数据块
* - done: 生成完成(含Token统计)
* - error: 生成错误
* - ping: 心跳保活(每30秒)
*/
@PostMapping(
value = "/generate/stream",
produces = MediaType.TEXT_EVENT_STREAM_VALUE
)
@Operation(
summary = "流式AI文本生成",
description = "使用SSE实时推送生成内容,适合对话场景"
)
@ApiResponse(
responseCode = "200",
description = "流式内容(SSE格式)",
content = @Content(
mediaType = "text/event-stream",
schema = @Schema(implementation = StreamChunkEvent.class)
)
)
public Flux<ServerSentEvent<Object>> generateStream(
@Valid @RequestBody StreamGenerateRequest request,
@RequestHeader(value = "X-Request-Id",
defaultValue = "") String requestId) {
// 生成请求ID(如果客户端没有提供)
String reqId = requestId.isBlank()
? UUID.randomUUID().toString()
: requestId;
log.info("收到流式生成请求 requestId={}, model={}, promptLength={}",
reqId, request.model(), request.prompt().length());
return Flux.concat(
// 1. 发送处理中事件
Flux.just(buildStatusEvent(reqId, "processing")),
// 2. 流式内容
aiStreamService.streamGenerate(request, reqId)
.map(chunk -> ServerSentEvent.<Object>builder()
.id(String.valueOf(chunk.index()))
.event("chunk")
.data(chunk)
.build()),
// 3. 发送完成事件
aiStreamService.getUsageStats(reqId)
.map(usage -> ServerSentEvent.<Object>builder()
.event("done")
.data(new DoneEvent("done", usage, "stop"))
.build())
)
// 每30秒发送心跳
.mergeWith(
Flux.interval(Duration.ofSeconds(30))
.map(tick -> ServerSentEvent.<Object>builder()
.event("ping")
.data(new PingEvent("ping", System.currentTimeMillis()))
.build())
)
.doOnError(e -> {
log.error("流式生成失败 requestId={}", reqId, e);
})
.onErrorReturn(
ServerSentEvent.<Object>builder()
.event("error")
.data(buildErrorEvent(reqId, e -> e))
.build()
);
}
/**
* 非流式AI生成(同步等待完整响应)
* 适用于短文本、结构化输出场景
*/
@PostMapping("/generate")
@Operation(summary = "同步AI文本生成", description = "适合短文本场景,最多等待30秒")
public GenerateResponse generate(
@Valid @RequestBody GenerateRequest request) {
return aiStreamService.generate(request);
}
private ServerSentEvent<Object> buildStatusEvent(String reqId, String status) {
return ServerSentEvent.<Object>builder()
.event("status")
.data(new StatusEvent("status", reqId, status,
System.currentTimeMillis()))
.build();
}
private ErrorEvent buildErrorEvent(String reqId, Object error) {
return new ErrorEvent("error", "INTERNAL_ERROR",
"生成过程中发生错误", reqId);
}
}2.4 请求/响应DTO设计
// StreamGenerateRequest.java
package com.laozhang.api.model;
import io.swagger.v3.oas.annotations.media.Schema;
import jakarta.validation.constraints.*;
@Schema(description = "流式生成请求")
public record StreamGenerateRequest(
@NotBlank(message = "prompt不能为空")
@Size(max = 32000, message = "prompt不能超过32000字符")
@Schema(description = "用户输入的提示词", example = "帮我分析一下Java虚拟线程的优缺点")
String prompt,
@Schema(description = "使用的AI模型", example = "gpt-4o",
allowableValues = {"gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet"})
@Pattern(regexp = "gpt-4o|gpt-4o-mini|claude-3-5-sonnet",
message = "不支持的模型")
String model,
@Min(value = 1, message = "max_tokens最小为1")
@Max(value = 4096, message = "max_tokens最大为4096")
@Schema(description = "最大生成Token数", example = "1024", defaultValue = "1024")
Integer maxTokens,
@DecimalMin(value = "0.0", message = "temperature最小为0")
@DecimalMax(value = "2.0", message = "temperature最大为2")
@Schema(description = "随机性控制,0为确定性,2为高随机", example = "0.7")
Double temperature,
@Schema(description = "是否开启流式响应", defaultValue = "true")
Boolean stream,
@Schema(description = "用户标识,用于限流和审计")
String userId,
@Schema(description = "会话ID,用于多轮对话上下文")
String sessionId
) {
// 默认值处理
public StreamGenerateRequest {
if (model == null || model.isBlank()) model = "gpt-4o";
if (maxTokens == null) maxTokens = 1024;
if (temperature == null) temperature = 0.7;
if (stream == null) stream = true;
}
}
// StreamChunkEvent.java
@Schema(description = "流式内容块事件")
public record StreamChunkEvent(
@Schema(description = "事件类型,固定为chunk") String type,
@Schema(description = "本次生成的文本内容") String content,
@Schema(description = "当前块的序号,从0开始") int index,
@Schema(description = "使用的模型名称") String model
) {}
// DoneEvent.java
@Schema(description = "生成完成事件")
public record DoneEvent(
@Schema(description = "事件类型,固定为done") String type,
@Schema(description = "Token使用统计") TokenUsage usage,
@Schema(description = "停止原因:stop/length/content_filter") String finishReason
) {}
// TokenUsage.java
@Schema(description = "Token使用统计")
public record TokenUsage(
@Schema(description = "输入Token数") int inputTokens,
@Schema(description = "输出Token数") int outputTokens,
@Schema(description = "总Token数") int totalTokens,
@Schema(description = "估算成本(美元)") double estimatedCostUsd
) {}三、异步长任务API设计:Polling模式
3.1 三步式异步API设计
3.2 完整异步任务API实现
// AsyncTaskController.java
package com.laozhang.api.controller;
import com.laozhang.api.model.*;
import com.laozhang.api.service.AsyncTaskService;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.headers.Header;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.servlet.support.ServletUriComponentsBuilder;
import java.net.URI;
@RestController
@RequestMapping("/api/v2/tasks")
@Slf4j
@RequiredArgsConstructor
public class AsyncTaskController {
private final AsyncTaskService taskService;
/**
* 提交异步AI分析任务
* 返回202 Accepted,包含任务ID和轮询URL
*/
@PostMapping
@Operation(summary = "提交AI分析任务",
description = "提交后立即返回,通过taskId轮询状态")
@ApiResponse(
responseCode = "202",
description = "任务已接受,开始排队处理",
headers = {
@Header(name = "Location",
description = "任务状态查询URL"),
@Header(name = "Retry-After",
description = "建议首次查询等待时间(秒)")
}
)
public ResponseEntity<TaskSubmitResponse> submitTask(
@Valid @RequestBody TaskSubmitRequest request) {
TaskSubmitResponse task = taskService.submit(request);
// Location Header:告诉客户端去哪里查询任务状态
URI location = ServletUriComponentsBuilder
.fromCurrentRequest()
.path("/{taskId}/status")
.buildAndExpand(task.taskId())
.toUri();
return ResponseEntity
.status(HttpStatus.ACCEPTED)
.location(location)
.header("Retry-After", estimateWaitSeconds(request))
.body(task);
}
/**
* 查询任务状态
*/
@GetMapping("/{taskId}/status")
@Operation(summary = "查询任务处理状态")
@ApiResponse(responseCode = "200", description = "任务状态信息")
@ApiResponse(responseCode = "404", description = "任务不存在")
public ResponseEntity<TaskStatusResponse> getTaskStatus(
@PathVariable String taskId) {
TaskStatusResponse status = taskService.getStatus(taskId);
ResponseEntity.BodyBuilder builder = ResponseEntity.ok();
// 如果任务还在处理,加上Retry-After建议下次查询时间
if (status.isInProgress()) {
int retryAfter = calculateRetryAfter(status.progress());
builder.header("Retry-After", String.valueOf(retryAfter));
}
return builder.body(status);
}
/**
* 获取任务结果(只有状态为completed时才能调用)
*/
@GetMapping("/{taskId}/result")
@Operation(summary = "获取任务处理结果",
description = "任务状态为completed后才可调用,结果保存72小时")
@ApiResponse(responseCode = "200", description = "AI分析结果")
@ApiResponse(responseCode = "404", description = "任务不存在")
@ApiResponse(responseCode = "409", description = "任务尚未完成")
public ResponseEntity<TaskResultResponse> getTaskResult(
@PathVariable String taskId) {
return taskService.getResult(taskId)
.map(ResponseEntity::ok)
.orElseThrow(() -> new TaskNotCompletedException(taskId));
}
/**
* 取消正在进行的任务
*/
@DeleteMapping("/{taskId}")
@Operation(summary = "取消任务")
@ApiResponse(responseCode = "200", description = "任务已取消")
@ApiResponse(responseCode = "409", description = "任务已完成,无法取消")
public ResponseEntity<TaskCancelResponse> cancelTask(
@PathVariable String taskId) {
taskService.cancel(taskId);
return ResponseEntity.ok(
new TaskCancelResponse(taskId, "cancelled",
System.currentTimeMillis())
);
}
/**
* 根据文件大小估算等待时间
*/
private String estimateWaitSeconds(TaskSubmitRequest request) {
long fileSizeKb = request.fileSizeBytes() / 1024;
// 粗略估算:每KB约需要0.1秒处理
int estimatedSeconds = Math.max(5, (int)(fileSizeKb * 0.1));
return String.valueOf(Math.min(estimatedSeconds, 60));
}
/**
* 根据进度计算下次轮询间隔(指数退避)
*/
private int calculateRetryAfter(int progress) {
if (progress < 10) return 5; // 刚开始:5秒后查
if (progress < 50) return 3; // 处理中:3秒后查
if (progress < 90) return 2; // 快完成了:2秒后查
return 1; // 即将完成:1秒后查
}
}3.3 任务状态模型
// TaskStatusResponse.java
package com.laozhang.api.model;
import io.swagger.v3.oas.annotations.media.Schema;
@Schema(description = "任务状态响应")
public record TaskStatusResponse(
@Schema(description = "任务ID", example = "task_2x9k3m4n")
String taskId,
@Schema(
description = "任务状态",
allowableValues = {"queued", "processing", "completed", "failed", "cancelled"},
example = "processing"
)
String status,
@Schema(description = "处理进度 0-100", example = "45")
int progress,
@Schema(description = "任务提交时间(Unix毫秒)")
long submittedAt,
@Schema(description = "任务开始处理时间(Unix毫秒),排队中为null")
Long startedAt,
@Schema(description = "任务完成时间(Unix毫秒),未完成为null")
Long completedAt,
@Schema(description = "预估完成时间(Unix毫秒),仅处理中时有值")
Long estimatedCompletedAt,
@Schema(description = "失败原因,仅status=failed时有值")
String failureReason,
@Schema(description = "任务队列位置,仅status=queued时有值")
Integer queuePosition
) {
public boolean isInProgress() {
return "queued".equals(status) || "processing".equals(status);
}
public boolean isCompleted() {
return "completed".equals(status);
}
}四、WebHook设计:AI任务完成后主动推送
4.1 WebHook API设计原则
// WebhookController.java
package com.laozhang.api.controller;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.laozhang.api.model.*;
import com.laozhang.api.service.WebhookDeliveryService;
import io.swagger.v3.oas.annotations.Operation;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
/**
* WebHook注册管理接口
* 允许客户端注册回调URL,任务完成后主动推送
*/
@RestController
@RequestMapping("/api/v2/webhooks")
@Slf4j
@RequiredArgsConstructor
public class WebhookController {
private final WebhookDeliveryService deliveryService;
/**
* 注册WebHook
*/
@PostMapping
@Operation(summary = "注册WebHook回调")
public ResponseEntity<WebhookRegistration> registerWebhook(
@RequestBody WebhookRegisterRequest request) {
WebhookRegistration registration = deliveryService.register(request);
// 注册成功后立即发送一个测试事件验证URL可达性
deliveryService.sendVerificationEvent(registration.webhookId());
return ResponseEntity.ok(registration);
}
/**
* 查询WebHook投递历史(用于排错)
*/
@GetMapping("/{webhookId}/deliveries")
@Operation(summary = "查询WebHook投递记录")
public ResponseEntity<WebhookDeliveryPage> getDeliveries(
@PathVariable String webhookId,
@RequestParam(defaultValue = "0") int page,
@RequestParam(defaultValue = "20") int size) {
return ResponseEntity.ok(
deliveryService.getDeliveries(webhookId, page, size)
);
}
}4.2 WebHook签名验证(安全关键)
// WebhookDeliveryService.java
package com.laozhang.api.service;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import java.nio.charset.StandardCharsets;
import java.security.InvalidKeyException;
import java.security.NoSuchAlgorithmException;
import java.time.Instant;
import java.util.Base64;
import java.util.Map;
@Service
@Slf4j
@RequiredArgsConstructor
public class WebhookDeliveryService {
private final WebClient.Builder webClientBuilder;
private final WebhookRepository webhookRepository;
/**
* 异步发送WebHook通知
* 包含签名验证,支持重试
*/
@Async("webhookExecutor")
public void deliverWebhook(String webhookId, WebhookEvent event) {
WebhookRegistration webhook = webhookRepository.findById(webhookId)
.orElseThrow(() -> new WebhookNotFoundException(webhookId));
String payload = serializeEvent(event);
String timestamp = String.valueOf(Instant.now().getEpochSecond());
String signature = computeSignature(payload, timestamp, webhook.secret());
log.info("投递WebHook webhookId={}, eventType={}, attempt={}",
webhookId, event.eventType(), 1);
webClientBuilder
.baseUrl(webhook.targetUrl())
.build()
.post()
.headers(headers -> {
// 标准WebHook安全Header
headers.set("Content-Type", "application/json");
headers.set("X-Webhook-Id", webhookId);
headers.set("X-Webhook-Timestamp", timestamp);
headers.set("X-Webhook-Signature", "v1=" + signature);
headers.set("X-Webhook-Event", event.eventType());
// 幂等Key:相同事件重试不会重复处理
headers.set("X-Idempotency-Key", event.idempotencyKey());
})
.bodyValue(payload)
.retrieve()
.toBodilessEntity()
.retryWhen(buildRetrySpec())
.subscribe(
resp -> log.info("WebHook投递成功 webhookId={}, status={}",
webhookId, resp.getStatusCode()),
err -> log.error("WebHook投递最终失败 webhookId={}", webhookId, err)
);
}
/**
* 计算HMAC-SHA256签名
* 客户端验证方式:signature = HMAC-SHA256(secret, timestamp + "." + payload)
*/
private String computeSignature(String payload, String timestamp, String secret) {
try {
String message = timestamp + "." + payload;
Mac mac = Mac.getInstance("HmacSHA256");
SecretKeySpec keySpec = new SecretKeySpec(
secret.getBytes(StandardCharsets.UTF_8), "HmacSHA256");
mac.init(keySpec);
byte[] signature = mac.doFinal(
message.getBytes(StandardCharsets.UTF_8));
return Base64.getEncoder().encodeToString(signature);
} catch (NoSuchAlgorithmException | InvalidKeyException e) {
throw new RuntimeException("签名计算失败", e);
}
}
/**
* 指数退避重试:1分钟、5分钟、30分钟、2小时、8小时
*/
private reactor.util.retry.Retry buildRetrySpec() {
return reactor.util.retry.Retry
.backoff(5, java.time.Duration.ofMinutes(1))
.maxBackoff(java.time.Duration.ofHours(8))
.jitter(0.1);
}
private String serializeEvent(WebhookEvent event) {
// JSON序列化...
return "{}";
}
}4.3 WebHook事件格式
// WebhookEvent.java
package com.laozhang.api.model;
import io.swagger.v3.oas.annotations.media.Schema;
@Schema(description = "WebHook事件载荷")
public record WebhookEvent(
@Schema(description = "WebHook ID") String webhookId,
@Schema(
description = "事件类型",
allowableValues = {
"task.completed",
"task.failed",
"task.cancelled",
"webhook.verification"
}
)
String eventType,
@Schema(description = "事件触发时间(ISO 8601)")
String timestamp,
@Schema(description = "幂等Key,相同事件重发时Key不变,客户端去重用")
String idempotencyKey,
@Schema(description = "API版本") String apiVersion,
@Schema(description = "事件数据") Object data
) {}五、OpenAPI 3.0文档:AI接口的标准化描述
5.1 OpenAPI YAML示例(关键部分)
# openapi.yaml
openapi: 3.0.3
info:
title: AI文档分析平台 API
description: |
## 概述
提供AI驱动的文档分析、摘要生成、内容提取等功能。
## 认证方式
所有接口需要Bearer Token认证:
```
Authorization: Bearer your_api_key
```
## 限流规则
- 标准版:60次/分钟,10000次/天
- 专业版:300次/分钟,100000次/天
- 超出限制返回429状态码,Header中包含重置时间
## 流式接口说明
流式接口(/generate/stream)使用Server-Sent Events(SSE)格式,
需要客户端处理`text/event-stream`响应类型。
version: "2.0.0"
contact:
name: API Support
email: api-support@example.com
servers:
- url: https://api.example.com/api/v2
description: 生产环境
- url: https://sandbox.api.example.com/api/v2
description: 沙箱环境
paths:
/ai/generate/stream:
post:
summary: 流式AI文本生成
operationId: generateStream
tags: [AI生成]
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/StreamGenerateRequest'
examples:
basicChat:
summary: 基础对话示例
value:
prompt: "帮我解释Java虚拟线程的原理"
model: "gpt-4o"
maxTokens: 1024
temperature: 0.7
responses:
'200':
description: 流式内容(SSE格式)
content:
text/event-stream:
schema:
type: string
example: |
event: chunk
data: {"type":"chunk","content":"Java虚","index":0}
event: done
data: {"type":"done","usage":{"inputTokens":45,"outputTokens":892}}
'400':
$ref: '#/components/responses/BadRequest'
'429':
$ref: '#/components/responses/RateLimitExceeded'
components:
schemas:
StreamGenerateRequest:
type: object
required: [prompt]
properties:
prompt:
type: string
maxLength: 32000
description: 用户提示词
example: "帮我分析Java虚拟线程的优缺点"
model:
type: string
enum: [gpt-4o, gpt-4o-mini, claude-3-5-sonnet]
default: gpt-4o
description: AI模型选择
maxTokens:
type: integer
minimum: 1
maximum: 4096
default: 1024
description: 最大生成Token数
temperature:
type: number
minimum: 0
maximum: 2
default: 0.7
description: 随机性控制
AiError:
type: object
required: [code, message, requestId]
properties:
code:
type: string
description: 错误码(机器可读)
example: "TOKEN_LIMIT_EXCEEDED"
message:
type: string
description: 错误说明(人类可读)
example: "输入文本超过模型最大Token限制(32000个Token)"
requestId:
type: string
description: 请求ID,用于客服排查
example: "req_2x9k3m4n5p"
details:
type: object
description: 错误详情(可选)
responses:
BadRequest:
description: 请求参数错误
content:
application/json:
schema:
$ref: '#/components/schemas/AiError'
RateLimitExceeded:
description: 超出限流配额
headers:
X-RateLimit-Limit:
schema:
type: integer
description: 窗口内请求总额度
X-RateLimit-Remaining:
schema:
type: integer
description: 当前窗口剩余额度
X-RateLimit-Reset:
schema:
type: integer
description: 配额重置时间(Unix时间戳)
Retry-After:
schema:
type: integer
description: 建议重试等待时间(秒)
content:
application/json:
schema:
$ref: '#/components/schemas/AiError'
securitySchemes:
BearerAuth:
type: http
scheme: bearer
bearerFormat: JWT
security:
- BearerAuth: []5.2 Springdoc OpenAPI集成
// OpenApiConfig.java
package com.laozhang.api.config;
import io.swagger.v3.oas.models.*;
import io.swagger.v3.oas.models.info.Info;
import io.swagger.v3.oas.models.security.SecurityScheme;
import io.swagger.v3.oas.models.servers.Server;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.util.List;
@Configuration
public class OpenApiConfig {
@Bean
public OpenAPI customOpenAPI() {
return new OpenAPI()
.info(new Info()
.title("AI文档分析平台 API")
.version("2.0.0")
.description("""
## AI API设计规范
本API遵循以下设计原则:
1. **流式优先**:支持SSE实时推送
2. **异步友好**:长任务使用轮询/WebHook
3. **语义化错误**:AI特有错误码清晰可读
4. **客户端友好**:Retry-After引导重试
""")
)
.servers(List.of(
new Server().url("https://api.example.com").description("生产环境"),
new Server().url("http://localhost:8080").description("本地开发")
))
.components(new Components()
.addSecuritySchemes("BearerAuth",
new SecurityScheme()
.type(SecurityScheme.Type.HTTP)
.scheme("bearer")
.bearerFormat("API Key")));
}
}六、版本管理:AI API的版本策略
6.1 版本策略对比
| 策略 | URL路径 | Header | 优缺点 |
|---|---|---|---|
| URL路径 | /api/v1/generate | - | 直观、易缓存,URL变化大 |
| Accept Header | - | Accept: application/vnd.api.v2+json | 优雅,但调试麻烦 |
| 自定义Header | - | API-Version: 2 | 折中方案 |
AI API推荐策略:URL路径 + Header双支持
// ApiVersionConfig.java
package com.laozhang.api.config;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.servlet.config.annotation.PathMatchConfigurer;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
/**
* API版本策略:
* 1. 主版本号(v1/v2):有破坏性变更时升级
* 2. 副版本号(通过Header X-API-Minor-Version):新增字段等兼容变更
* 3. 旧版本保留12个月后下线,提前3个月发送Deprecation Header警告
*/
public class ApiVersionConfig implements WebMvcConfigurer {
@Override
public void configurePathMatch(PathMatchConfigurer configurer) {
// 自动为所有Controller添加/api/v{version}前缀
configurer.addPathPrefix("/api/v2",
cls -> cls.isAnnotationPresent(RequestMapping.class)
&& cls.getPackageName().startsWith("com.laozhang.api.v2"));
}
}6.2 API废弃警告中间件
// DeprecationInterceptor.java
package com.laozhang.api.interceptor;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.springframework.web.servlet.HandlerInterceptor;
/**
* 在废弃的API响应中注入警告Header
* 客户端可以据此提前做迁移
*/
public class DeprecationInterceptor implements HandlerInterceptor {
@Override
public boolean preHandle(HttpServletRequest request,
HttpServletResponse response,
Object handler) {
String uri = request.getRequestURI();
// v1 API将于2027-03-01下线
if (uri.startsWith("/api/v1/")) {
response.setHeader("Deprecation",
"Sat, 01 Mar 2027 00:00:00 GMT");
response.setHeader("Sunset",
"Sat, 01 Mar 2027 00:00:00 GMT");
response.setHeader("Link",
"<https://docs.example.com/migration/v1-to-v2>; rel=\"deprecation\"");
response.setHeader("Warning",
"299 - \"v1 API将于2027-03-01废弃,请迁移至v2\"");
}
return true;
}
}七、错误码设计:AI特有错误的标准化
7.1 AI错误码体系
// AiErrorCode.java
package com.laozhang.api.error;
import org.springframework.http.HttpStatus;
public enum AiErrorCode {
// ===== 通用错误(400x) =====
INVALID_PARAMETER(400, "INVALID_PARAMETER", "请求参数无效"),
MISSING_REQUIRED_FIELD(400, "MISSING_REQUIRED_FIELD", "缺少必填字段"),
// ===== AI特有输入错误(401x) =====
TOKEN_LIMIT_EXCEEDED(400, "TOKEN_LIMIT_EXCEEDED",
"输入文本超过模型最大Token限制"),
UNSUPPORTED_LANGUAGE(400, "UNSUPPORTED_LANGUAGE",
"不支持的语言,当前支持:中文、英文、日文"),
INVALID_FILE_FORMAT(400, "INVALID_FILE_FORMAT",
"不支持的文件格式,支持:PDF/Word/TXT/MD"),
FILE_TOO_LARGE(400, "FILE_TOO_LARGE",
"文件超过大小限制(最大50MB)"),
// ===== 认证/授权错误(403x) =====
INVALID_API_KEY(401, "INVALID_API_KEY", "API Key无效或已过期"),
QUOTA_EXHAUSTED(403, "QUOTA_EXHAUSTED",
"账户Token配额已用尽,请充值或等待配额重置"),
FEATURE_NOT_AVAILABLE(403, "FEATURE_NOT_AVAILABLE",
"当前套餐不支持此功能"),
// ===== AI安全/内容错误(411x) =====
CONTENT_POLICY_VIOLATION(400, "CONTENT_POLICY_VIOLATION",
"输入内容违反内容安全策略"),
OUTPUT_FILTERED(200, "OUTPUT_FILTERED",
"生成内容被安全过滤,已返回安全版本"),
PII_DETECTED(400, "PII_DETECTED",
"检测到个人敏感信息(PII),请脱敏后重试"),
// ===== 限流错误(429x) =====
RATE_LIMIT_EXCEEDED(429, "RATE_LIMIT_EXCEEDED",
"请求频率超过限制"),
CONCURRENT_LIMIT_EXCEEDED(429, "CONCURRENT_LIMIT_EXCEEDED",
"并发请求数超过限制"),
// ===== AI模型错误(500x) =====
MODEL_UNAVAILABLE(503, "MODEL_UNAVAILABLE",
"AI模型服务暂时不可用,请稍后重试"),
MODEL_TIMEOUT(504, "MODEL_TIMEOUT",
"AI模型响应超时,请减少输入长度后重试"),
GENERATION_FAILED(500, "GENERATION_FAILED",
"AI生成失败,请重试"),
// ===== 任务相关错误 =====
TASK_NOT_FOUND(404, "TASK_NOT_FOUND", "任务不存在"),
TASK_NOT_COMPLETED(409, "TASK_NOT_COMPLETED",
"任务尚未完成,无法获取结果"),
TASK_ALREADY_CANCELLED(409, "TASK_ALREADY_CANCELLED",
"任务已取消"),
TASK_EXPIRED(410, "TASK_EXPIRED",
"任务结果已过期(72小时),请重新提交");
private final int httpStatus;
private final String code;
private final String defaultMessage;
AiErrorCode(int httpStatus, String code, String defaultMessage) {
this.httpStatus = httpStatus;
this.code = code;
this.defaultMessage = defaultMessage;
}
public HttpStatus toHttpStatus() {
return HttpStatus.valueOf(httpStatus);
}
public String getCode() { return code; }
public String getDefaultMessage() { return defaultMessage; }
}7.2 统一异常处理器
// GlobalExceptionHandler.java
package com.laozhang.api.exception;
import com.laozhang.api.error.AiErrorCode;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.ResponseEntity;
import org.springframework.validation.FieldError;
import org.springframework.web.bind.MethodArgumentNotValidException;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;
import java.time.Instant;
import java.util.UUID;
import java.util.stream.Collectors;
@RestControllerAdvice
@Slf4j
public class GlobalExceptionHandler {
@ExceptionHandler(AiApiException.class)
public ResponseEntity<ErrorResponse> handleAiException(AiApiException ex) {
log.warn("AI API异常: code={}, message={}", ex.getErrorCode(), ex.getMessage());
ErrorResponse errorResponse = new ErrorResponse(
ex.getErrorCode().getCode(),
ex.getMessage() != null ? ex.getMessage()
: ex.getErrorCode().getDefaultMessage(),
generateRequestId(),
Instant.now().toString(),
ex.getDetails()
);
return ResponseEntity
.status(ex.getErrorCode().toHttpStatus())
.body(errorResponse);
}
@ExceptionHandler(MethodArgumentNotValidException.class)
public ResponseEntity<ErrorResponse> handleValidation(
MethodArgumentNotValidException ex) {
String details = ex.getBindingResult().getFieldErrors()
.stream()
.map(FieldError::getDefaultMessage)
.collect(Collectors.joining("; "));
ErrorResponse errorResponse = new ErrorResponse(
AiErrorCode.INVALID_PARAMETER.getCode(),
"请求参数验证失败: " + details,
generateRequestId(),
Instant.now().toString(),
null
);
return ResponseEntity.badRequest().body(errorResponse);
}
private String generateRequestId() {
return "req_" + UUID.randomUUID().toString().replace("-", "").substring(0, 16);
}
}
// ErrorResponse.java
record ErrorResponse(
String code,
String message,
String requestId,
String timestamp,
Object details
) {}八、限流设计:Token桶 + 滑动窗口
8.1 双层限流策略
// RateLimitFilter.java
package com.laozhang.api.filter;
import com.laozhang.api.error.AiErrorCode;
import io.github.bucket4j.*;
import jakarta.servlet.*;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.core.annotation.Order;
import org.springframework.stereotype.Component;
import java.io.IOException;
import java.time.Duration;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
@Component
@Order(2)
@Slf4j
@RequiredArgsConstructor
public class RateLimitFilter implements Filter {
// 每个API Key的Token桶(内存存储,生产环境换Redis)
private final ConcurrentMap<String, Bucket> buckets = new ConcurrentHashMap<>();
// 标准版限流配置:60次/分钟
private static final long STANDARD_REQUESTS_PER_MINUTE = 60L;
private static final long STANDARD_REQUESTS_PER_DAY = 10000L;
@Override
public void doFilter(ServletRequest req, ServletResponse resp,
FilterChain chain) throws IOException, ServletException {
HttpServletRequest request = (HttpServletRequest) req;
HttpServletResponse response = (HttpServletResponse) resp;
String apiKey = extractApiKey(request);
if (apiKey == null) {
chain.doFilter(req, resp);
return;
}
Bucket bucket = getOrCreateBucket(apiKey);
// 尝试消耗1个Token
ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);
if (probe.isConsumed()) {
// 添加限流信息Header
response.setHeader("X-RateLimit-Limit",
String.valueOf(STANDARD_REQUESTS_PER_MINUTE));
response.setHeader("X-RateLimit-Remaining",
String.valueOf(probe.getRemainingTokens()));
response.setHeader("X-RateLimit-Reset",
String.valueOf(System.currentTimeMillis() / 1000 + 60));
chain.doFilter(req, resp);
} else {
// 限流:返回429
long waitForRefillNanos = probe.getNanosToWaitForRefill();
long retryAfterSeconds = (waitForRefillNanos / 1_000_000_000L) + 1;
response.setStatus(429);
response.setHeader("Content-Type", "application/json");
response.setHeader("Retry-After", String.valueOf(retryAfterSeconds));
response.setHeader("X-RateLimit-Limit",
String.valueOf(STANDARD_REQUESTS_PER_MINUTE));
response.setHeader("X-RateLimit-Remaining", "0");
response.getWriter().write("""
{
"code": "RATE_LIMIT_EXCEEDED",
"message": "请求频率超过限制(60次/分钟),请 %d 秒后重试",
"retryAfterSeconds": %d
}
""".formatted(retryAfterSeconds, retryAfterSeconds));
}
}
private Bucket getOrCreateBucket(String apiKey) {
return buckets.computeIfAbsent(apiKey, key -> {
// 双层限流:分钟级 + 天级
return Bucket.builder()
// 层1:每分钟60次(防突发)
.addLimit(Bandwidth.classic(
STANDARD_REQUESTS_PER_MINUTE,
Refill.intervally(STANDARD_REQUESTS_PER_MINUTE,
Duration.ofMinutes(1))
))
// 层2:每天10000次(防滥用)
.addLimit(Bandwidth.classic(
STANDARD_REQUESTS_PER_DAY,
Refill.intervally(STANDARD_REQUESTS_PER_DAY,
Duration.ofDays(1))
))
.build();
});
}
private String extractApiKey(HttpServletRequest request) {
String auth = request.getHeader("Authorization");
if (auth != null && auth.startsWith("Bearer ")) {
return auth.substring(7);
}
return null;
}
}九、SDK生成:从OpenAPI自动生成Java客户端
9.1 Maven插件配置(自动生成SDK)
<!-- pom.xml(SDK生成器) -->
<plugin>
<groupId>org.openapitools</groupId>
<artifactId>openapi-generator-maven-plugin</artifactId>
<version>7.8.0</version>
<executions>
<execution>
<goals>
<goal>generate</goal>
</goals>
<configuration>
<!-- OpenAPI规范文件路径 -->
<inputSpec>
${project.basedir}/src/main/resources/openapi.yaml
</inputSpec>
<!-- 生成Java客户端 -->
<generatorName>java</generatorName>
<!-- 输出目录 -->
<output>${project.build.directory}/generated-sources/openapi</output>
<!-- 客户端配置 -->
<configOptions>
<library>resttemplate</library>
<apiPackage>com.laozhang.sdk.api</apiPackage>
<modelPackage>com.laozhang.sdk.model</modelPackage>
<invokerPackage>com.laozhang.sdk</invokerPackage>
<dateLibrary>java8</dateLibrary>
<useOptional>true</useOptional>
<generateBuilders>true</generateBuilders>
<serializationLibrary>jackson</serializationLibrary>
</configOptions>
</configuration>
</execution>
</executions>
</plugin>9.2 生成的SDK使用示例
// SDK使用示例(自动生成的客户端)
package com.example.demo;
import com.laozhang.sdk.ApiClient;
import com.laozhang.sdk.api.AiGenerationApi;
import com.laozhang.sdk.model.StreamGenerateRequest;
import com.laozhang.sdk.model.TaskStatusResponse;
public class SdkUsageExample {
public static void main(String[] args) throws Exception {
// 初始化客户端
ApiClient client = new ApiClient();
client.setBasePath("https://api.example.com/api/v2");
client.setBearerToken("your_api_key_here");
AiGenerationApi api = new AiGenerationApi(client);
// 提交异步任务
var request = new StreamGenerateRequest()
.prompt("帮我分析这个合同的风险条款")
.model("gpt-4o")
.maxTokens(2048);
var submitResponse = api.submitTask(request);
String taskId = submitResponse.getTaskId();
System.out.println("任务已提交,taskId: " + taskId);
System.out.println("建议 " + submitResponse.getEstimatedWaitSeconds() + " 秒后查询状态");
// 轮询任务状态(指数退避)
int attempt = 0;
while (true) {
Thread.sleep(calculateDelay(attempt++) * 1000L);
TaskStatusResponse status = api.getTaskStatus(taskId);
System.out.printf("任务状态: %s,进度: %d%%\n",
status.getStatus(), status.getProgress());
if (status.isCompleted()) {
var result = api.getTaskResult(taskId);
System.out.println("分析结果: " + result.getContent());
break;
}
if ("failed".equals(status.getStatus())) {
System.err.println("任务失败: " + status.getFailureReason());
break;
}
}
}
private static long calculateDelay(int attempt) {
return Math.min(60, (long) Math.pow(2, attempt));
}
}十、性能数据与最佳实践总结
10.1 不同API模式的性能对比
| 指标 | 同步API | SSE流式 | 异步轮询 | WebHook |
|---|---|---|---|---|
| 首字节延迟 | N/A | 100~500ms | N/A | N/A |
| 完整响应时间 | 1~30s | 1~30s(流式体验好) | 不阻塞 | 不阻塞 |
| 服务器并发连接 | 1倍 | 1倍 | 0.01倍(轮询间隔长) | 0(主动推送) |
| 客户端复杂度 | 简单 | 中等 | 中等 | 复杂(需要公网端点) |
| 适合场景 | <3s响应 | 实时显示生成过程 | 3s~3min | >3min或批量 |
10.2 重新设计后的效果数据
王芳团队重构API后,统计了3个月的数据:
| 指标 | 重构前 | 重构后 |
|---|---|---|
| 外部开发者平均集成时间 | 2天 | 2小时 |
| 开发者支持工单数量 | 47个/月 | 8个/月 |
| API调用错误率 | 12% | 2.3% |
| 开发者满意度评分 | 3.2/5 | 4.6/5 |
| SDK下载量 | N/A | 1200次/月 |
十一、FAQ
Q1:流式接口和非流式接口需要单独维护吗?
不需要。Spring AI的ChatClient支持直接转换为Flux<String>(流式)或.content()(非流式),可以共用同一套业务逻辑,在Controller层根据请求参数stream=true/false分叉响应类型。
Q2:SSE连接断开了怎么处理?
SSE协议内置断点续传机制——客户端收到每条消息的id字段,断线重连时会携带Last-Event-ID Header。服务端根据这个ID重新发送未送达的消息。在Spring WebFlux中用ServerSentEvent.id()设置消息ID即可。
Q3:WebHook目标URL不可达,会丢消息吗?
正确实现的WebHook系统不会丢消息。推荐用指数退避重试(1分钟→5分钟→30分钟→2小时→8小时),总重试5次。如果全部失败,消息持久化到死信队列,运维人员手动重试。
Q4:API限流用Redis还是内存?
开发/单机:内存(Bucket4j) 生产多实例:Redis(使用Bucket4j + Redis后端,或Spring Security的Redis限流),多实例共享同一个计数器,否则每个实例独立计算限流会导致实际放行量是配置值的N倍。
Q5:OpenAPI文档如何保证和代码一致?
推荐Code First方式:用Spring MVC注解 + @Operation/@Schema注解生成OpenAPI文档,Springdoc自动扫描生成YAML。避免手写YAML和代码不同步的问题。CI流水线加上openapi-diff对比新旧版本是否有破坏性变更。
Q6:AI错误码和HTTP状态码如何对应?
AI特有错误一般都是可描述的4xx错误(参数问题/配额问题/内容过滤)。模型不可用返回503,模型超时返回504。500只保留给真正的意外异常。注意:内容被过滤时,要用业务层error code区分,不要用500(它实际上是正常功能)。
总结
好的AI API设计,核心是以开发者体验为核心:
- 流式优先:SSE让用户感受到AI在"思考",体验远优于等待
- 异步透明:清晰的任务状态机 + 合理的Retry-After,不让开发者猜
- 错误语义化:AI特有错误码让开发者知道该怎么处理
- 自动文档:OpenAPI + 自动生成SDK,降低集成门槛
- 限流友好:告诉开发者"还能用多少",而不是突然报错
从2天集成到2小时,这就是好API设计的价值。
