AI时代的API设计：设计让AI好用的RESTful接口

老张2026/9/14大约 21 分钟API设计RESTfulOpenAPISpring AIJava

AI时代的API设计：设计让AI好用的RESTful接口

从"2天集成"到"2小时集成"的故事

2025年9月，王芳盯着Slack消息发呆。

她是某SaaS公司AI平台团队的Tech Lead，团队花了3个月开发了一套AI文档分析API——上传文档、异步处理、查询结果、获取AI摘要。功能强大，内部测试完美。

然后他们把API开放给外部合作伙伴。

第一个合作伙伴的前端工程师小刘，整合用了2天，还发来了一长串投诉：

"你们的流式接口返回格式一会儿是JSON一会儿是纯文本，我根本不知道怎么解析"
"任务轮询接口没有告诉我多久查一次，我只好每秒查一次，你们服务直接被我打爆了"
"错误码全是200，我怎么知道是成功了还是失败了？"
"API文档写的是英文参数名，代码里用的是驼峰，文档里用的是下划线，到底哪个对？"

王芳团队开会复盘，发现问题的根源：他们的API是"内部API思维"设计的——只要自己人看得懂就行，从没考虑外部开发者的集成体验。

团队花了2周重新设计API，按照本文介绍的规范进行了彻底重构。第二个合作伙伴，2小时完成集成，零投诉。

这就是好API设计的价值。

一、AI API设计的特殊挑战

1.1 传统REST API vs AI API的核心差异

维度	传统REST API	AI API
响应时间	毫秒级，同步	毫秒~分钟，差异巨大
响应内容	结构化JSON	非结构化文本，流式
幂等性	通常可重试	每次调用可能不同（非确定性）
成本	固定	按Token计费，难预估
进度反馈	无需中间状态	需要实时进度
错误类型	系统错误为主	增加内容安全/Token超限等AI特有错误

1.2 AI API的三大典型场景

二、流式响应API设计：SSE规范

2.1 SSE协议基础

Server-Sent Events（SSE）是AI流式响应的最佳选择，相比WebSocket：

单向推送（服务器 → 客户端），符合AI生成场景
基于HTTP，穿透防火墙，无需握手协议
自动重连机制
所有现代浏览器原生支持

SSE消息格式规范：

# 标准SSE格式
id: <事件ID，用于断点续传>
event: <事件类型，可选>
data: <数据，必须是一行JSON>
retry: <重连延迟毫秒，可选>
\n\n  # 空行表示消息结束

2.2 AI流式事件类型设计

# 流式生成过程中的数据块
event: chunk
data: {"type":"chunk","content":"Java虚拟线程","index":0,"model":"gpt-4o"}

# 生成完成
event: done
data: {"type":"done","usage":{"input_tokens":245,"output_tokens":892,"total_tokens":1137},"finish_reason":"stop"}

# 错误事件
event: error
data: {"type":"error","code":"CONTENT_FILTER_TRIGGERED","message":"内容被安全策略过滤","request_id":"req_abc123"}

# 心跳（防止连接超时）
event: ping
data: {"type":"ping","timestamp":1726192800000}

2.3 完整Spring Boot SSE Controller

// AiStreamController.java
package com.laozhang.api.controller;

import com.laozhang.api.model.*;
import com.laozhang.api.service.AiStreamService;
import com.laozhang.api.validation.ValidRequest;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.media.Content;
import io.swagger.v3.oas.annotations.media.Schema;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.MediaType;
import org.springframework.http.codec.ServerSentEvent;
import org.springframework.validation.annotation.Validated;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Flux;

import java.time.Duration;
import java.util.UUID;

@RestController
@RequestMapping("/api/v2/ai")
@Tag(name = "AI生成接口", description = "支持流式和非流式的AI内容生成")
@Validated
@Slf4j
@RequiredArgsConstructor
public class AiStreamController {

    private final AiStreamService aiStreamService;

    /**
     * 流式文本生成接口（SSE）
     *
     * 适用场景：实时对话、代码补全、短文本生成（预期3秒内完成）
     * 响应格式：Server-Sent Events
     *
     * 事件类型：
     *   - chunk: 文本数据块
     *   - done: 生成完成（含Token统计）
     *   - error: 生成错误
     *   - ping: 心跳保活（每30秒）
     */
    @PostMapping(
        value = "/generate/stream",
        produces = MediaType.TEXT_EVENT_STREAM_VALUE
    )
    @Operation(
        summary = "流式AI文本生成",
        description = "使用SSE实时推送生成内容，适合对话场景"
    )
    @ApiResponse(
        responseCode = "200",
        description = "流式内容（SSE格式）",
        content = @Content(
            mediaType = "text/event-stream",
            schema = @Schema(implementation = StreamChunkEvent.class)
        )
    )
    public Flux<ServerSentEvent<Object>> generateStream(
            @Valid @RequestBody StreamGenerateRequest request,
            @RequestHeader(value = "X-Request-Id",
                          defaultValue = "") String requestId) {

        // 生成请求ID（如果客户端没有提供）
        String reqId = requestId.isBlank()
            ? UUID.randomUUID().toString()
            : requestId;

        log.info("收到流式生成请求 requestId={}, model={}, promptLength={}",
            reqId, request.model(), request.prompt().length());

        return Flux.concat(
            // 1. 发送处理中事件
            Flux.just(buildStatusEvent(reqId, "processing")),

            // 2. 流式内容
            aiStreamService.streamGenerate(request, reqId)
                .map(chunk -> ServerSentEvent.<Object>builder()
                    .id(String.valueOf(chunk.index()))
                    .event("chunk")
                    .data(chunk)
                    .build()),

            // 3. 发送完成事件
            aiStreamService.getUsageStats(reqId)
                .map(usage -> ServerSentEvent.<Object>builder()
                    .event("done")
                    .data(new DoneEvent("done", usage, "stop"))
                    .build())
        )
        // 每30秒发送心跳
        .mergeWith(
            Flux.interval(Duration.ofSeconds(30))
                .map(tick -> ServerSentEvent.<Object>builder()
                    .event("ping")
                    .data(new PingEvent("ping", System.currentTimeMillis()))
                    .build())
        )
        .doOnError(e -> {
            log.error("流式生成失败 requestId={}", reqId, e);
        })
        .onErrorReturn(
            ServerSentEvent.<Object>builder()
                .event("error")
                .data(buildErrorEvent(reqId, e -> e))
                .build()
        );
    }

    /**
     * 非流式AI生成（同步等待完整响应）
     * 适用于短文本、结构化输出场景
     */
    @PostMapping("/generate")
    @Operation(summary = "同步AI文本生成", description = "适合短文本场景，最多等待30秒")
    public GenerateResponse generate(
            @Valid @RequestBody GenerateRequest request) {

        return aiStreamService.generate(request);
    }

    private ServerSentEvent<Object> buildStatusEvent(String reqId, String status) {
        return ServerSentEvent.<Object>builder()
            .event("status")
            .data(new StatusEvent("status", reqId, status,
                System.currentTimeMillis()))
            .build();
    }

    private ErrorEvent buildErrorEvent(String reqId, Object error) {
        return new ErrorEvent("error", "INTERNAL_ERROR",
            "生成过程中发生错误", reqId);
    }
}

2.4 请求/响应DTO设计

// StreamGenerateRequest.java
package com.laozhang.api.model;

import io.swagger.v3.oas.annotations.media.Schema;
import jakarta.validation.constraints.*;

@Schema(description = "流式生成请求")
public record StreamGenerateRequest(

    @NotBlank(message = "prompt不能为空")
    @Size(max = 32000, message = "prompt不能超过32000字符")
    @Schema(description = "用户输入的提示词", example = "帮我分析一下Java虚拟线程的优缺点")
    String prompt,

    @Schema(description = "使用的AI模型", example = "gpt-4o",
            allowableValues = {"gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet"})
    @Pattern(regexp = "gpt-4o|gpt-4o-mini|claude-3-5-sonnet",
             message = "不支持的模型")
    String model,

    @Min(value = 1, message = "max_tokens最小为1")
    @Max(value = 4096, message = "max_tokens最大为4096")
    @Schema(description = "最大生成Token数", example = "1024", defaultValue = "1024")
    Integer maxTokens,

    @DecimalMin(value = "0.0", message = "temperature最小为0")
    @DecimalMax(value = "2.0", message = "temperature最大为2")
    @Schema(description = "随机性控制，0为确定性，2为高随机", example = "0.7")
    Double temperature,

    @Schema(description = "是否开启流式响应", defaultValue = "true")
    Boolean stream,

    @Schema(description = "用户标识，用于限流和审计")
    String userId,

    @Schema(description = "会话ID，用于多轮对话上下文")
    String sessionId
) {
    // 默认值处理
    public StreamGenerateRequest {
        if (model == null || model.isBlank()) model = "gpt-4o";
        if (maxTokens == null) maxTokens = 1024;
        if (temperature == null) temperature = 0.7;
        if (stream == null) stream = true;
    }
}

// StreamChunkEvent.java
@Schema(description = "流式内容块事件")
public record StreamChunkEvent(
    @Schema(description = "事件类型，固定为chunk") String type,
    @Schema(description = "本次生成的文本内容") String content,
    @Schema(description = "当前块的序号，从0开始") int index,
    @Schema(description = "使用的模型名称") String model
) {}

// DoneEvent.java
@Schema(description = "生成完成事件")
public record DoneEvent(
    @Schema(description = "事件类型，固定为done") String type,
    @Schema(description = "Token使用统计") TokenUsage usage,
    @Schema(description = "停止原因：stop/length/content_filter") String finishReason
) {}

// TokenUsage.java
@Schema(description = "Token使用统计")
public record TokenUsage(
    @Schema(description = "输入Token数") int inputTokens,
    @Schema(description = "输出Token数") int outputTokens,
    @Schema(description = "总Token数") int totalTokens,
    @Schema(description = "估算成本（美元）") double estimatedCostUsd
) {}

三、异步长任务API设计：Polling模式

3.1 三步式异步API设计

3.2 完整异步任务API实现

// AsyncTaskController.java
package com.laozhang.api.controller;

import com.laozhang.api.model.*;
import com.laozhang.api.service.AsyncTaskService;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.headers.Header;
import io.swagger.v3.oas.annotations.responses.ApiResponse;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.servlet.support.ServletUriComponentsBuilder;

import java.net.URI;

@RestController
@RequestMapping("/api/v2/tasks")
@Slf4j
@RequiredArgsConstructor
public class AsyncTaskController {

    private final AsyncTaskService taskService;

    /**
     * 提交异步AI分析任务
     * 返回202 Accepted，包含任务ID和轮询URL
     */
    @PostMapping
    @Operation(summary = "提交AI分析任务",
               description = "提交后立即返回，通过taskId轮询状态")
    @ApiResponse(
        responseCode = "202",
        description = "任务已接受，开始排队处理",
        headers = {
            @Header(name = "Location",
                    description = "任务状态查询URL"),
            @Header(name = "Retry-After",
                    description = "建议首次查询等待时间（秒）")
        }
    )
    public ResponseEntity<TaskSubmitResponse> submitTask(
            @Valid @RequestBody TaskSubmitRequest request) {

        TaskSubmitResponse task = taskService.submit(request);

        // Location Header：告诉客户端去哪里查询任务状态
        URI location = ServletUriComponentsBuilder
            .fromCurrentRequest()
            .path("/{taskId}/status")
            .buildAndExpand(task.taskId())
            .toUri();

        return ResponseEntity
            .status(HttpStatus.ACCEPTED)
            .location(location)
            .header("Retry-After", estimateWaitSeconds(request))
            .body(task);
    }

    /**
     * 查询任务状态
     */
    @GetMapping("/{taskId}/status")
    @Operation(summary = "查询任务处理状态")
    @ApiResponse(responseCode = "200", description = "任务状态信息")
    @ApiResponse(responseCode = "404", description = "任务不存在")
    public ResponseEntity<TaskStatusResponse> getTaskStatus(
            @PathVariable String taskId) {

        TaskStatusResponse status = taskService.getStatus(taskId);

        ResponseEntity.BodyBuilder builder = ResponseEntity.ok();

        // 如果任务还在处理，加上Retry-After建议下次查询时间
        if (status.isInProgress()) {
            int retryAfter = calculateRetryAfter(status.progress());
            builder.header("Retry-After", String.valueOf(retryAfter));
        }

        return builder.body(status);
    }

    /**
     * 获取任务结果（只有状态为completed时才能调用）
     */
    @GetMapping("/{taskId}/result")
    @Operation(summary = "获取任务处理结果",
               description = "任务状态为completed后才可调用，结果保存72小时")
    @ApiResponse(responseCode = "200", description = "AI分析结果")
    @ApiResponse(responseCode = "404", description = "任务不存在")
    @ApiResponse(responseCode = "409", description = "任务尚未完成")
    public ResponseEntity<TaskResultResponse> getTaskResult(
            @PathVariable String taskId) {

        return taskService.getResult(taskId)
            .map(ResponseEntity::ok)
            .orElseThrow(() -> new TaskNotCompletedException(taskId));
    }

    /**
     * 取消正在进行的任务
     */
    @DeleteMapping("/{taskId}")
    @Operation(summary = "取消任务")
    @ApiResponse(responseCode = "200", description = "任务已取消")
    @ApiResponse(responseCode = "409", description = "任务已完成，无法取消")
    public ResponseEntity<TaskCancelResponse> cancelTask(
            @PathVariable String taskId) {

        taskService.cancel(taskId);
        return ResponseEntity.ok(
            new TaskCancelResponse(taskId, "cancelled",
                System.currentTimeMillis())
        );
    }

    /**
     * 根据文件大小估算等待时间
     */
    private String estimateWaitSeconds(TaskSubmitRequest request) {
        long fileSizeKb = request.fileSizeBytes() / 1024;
        // 粗略估算：每KB约需要0.1秒处理
        int estimatedSeconds = Math.max(5, (int)(fileSizeKb * 0.1));
        return String.valueOf(Math.min(estimatedSeconds, 60));
    }

    /**
     * 根据进度计算下次轮询间隔（指数退避）
     */
    private int calculateRetryAfter(int progress) {
        if (progress < 10) return 5;        // 刚开始：5秒后查
        if (progress < 50) return 3;        // 处理中：3秒后查
        if (progress < 90) return 2;        // 快完成了：2秒后查
        return 1;                           // 即将完成：1秒后查
    }
}

3.3 任务状态模型

// TaskStatusResponse.java
package com.laozhang.api.model;

import io.swagger.v3.oas.annotations.media.Schema;

@Schema(description = "任务状态响应")
public record TaskStatusResponse(

    @Schema(description = "任务ID", example = "task_2x9k3m4n")
    String taskId,

    @Schema(
        description = "任务状态",
        allowableValues = {"queued", "processing", "completed", "failed", "cancelled"},
        example = "processing"
    )
    String status,

    @Schema(description = "处理进度 0-100", example = "45")
    int progress,

    @Schema(description = "任务提交时间（Unix毫秒）")
    long submittedAt,

    @Schema(description = "任务开始处理时间（Unix毫秒），排队中为null")
    Long startedAt,

    @Schema(description = "任务完成时间（Unix毫秒），未完成为null")
    Long completedAt,

    @Schema(description = "预估完成时间（Unix毫秒），仅处理中时有值")
    Long estimatedCompletedAt,

    @Schema(description = "失败原因，仅status=failed时有值")
    String failureReason,

    @Schema(description = "任务队列位置，仅status=queued时有值")
    Integer queuePosition
) {
    public boolean isInProgress() {
        return "queued".equals(status) || "processing".equals(status);
    }

    public boolean isCompleted() {
        return "completed".equals(status);
    }
}

四、WebHook设计：AI任务完成后主动推送

4.1 WebHook API设计原则

// WebhookController.java
package com.laozhang.api.controller;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.laozhang.api.model.*;
import com.laozhang.api.service.WebhookDeliveryService;
import io.swagger.v3.oas.annotations.Operation;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;

/**
 * WebHook注册管理接口
 * 允许客户端注册回调URL，任务完成后主动推送
 */
@RestController
@RequestMapping("/api/v2/webhooks")
@Slf4j
@RequiredArgsConstructor
public class WebhookController {

    private final WebhookDeliveryService deliveryService;

    /**
     * 注册WebHook
     */
    @PostMapping
    @Operation(summary = "注册WebHook回调")
    public ResponseEntity<WebhookRegistration> registerWebhook(
            @RequestBody WebhookRegisterRequest request) {

        WebhookRegistration registration = deliveryService.register(request);

        // 注册成功后立即发送一个测试事件验证URL可达性
        deliveryService.sendVerificationEvent(registration.webhookId());

        return ResponseEntity.ok(registration);
    }

    /**
     * 查询WebHook投递历史（用于排错）
     */
    @GetMapping("/{webhookId}/deliveries")
    @Operation(summary = "查询WebHook投递记录")
    public ResponseEntity<WebhookDeliveryPage> getDeliveries(
            @PathVariable String webhookId,
            @RequestParam(defaultValue = "0") int page,
            @RequestParam(defaultValue = "20") int size) {

        return ResponseEntity.ok(
            deliveryService.getDeliveries(webhookId, page, size)
        );
    }
}

4.2 WebHook签名验证（安全关键）

// WebhookDeliveryService.java
package com.laozhang.api.service;

import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import org.springframework.web.reactive.function.client.WebClient;

import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import java.nio.charset.StandardCharsets;
import java.security.InvalidKeyException;
import java.security.NoSuchAlgorithmException;
import java.time.Instant;
import java.util.Base64;
import java.util.Map;

@Service
@Slf4j
@RequiredArgsConstructor
public class WebhookDeliveryService {

    private final WebClient.Builder webClientBuilder;
    private final WebhookRepository webhookRepository;

    /**
     * 异步发送WebHook通知
     * 包含签名验证，支持重试
     */
    @Async("webhookExecutor")
    public void deliverWebhook(String webhookId, WebhookEvent event) {
        WebhookRegistration webhook = webhookRepository.findById(webhookId)
            .orElseThrow(() -> new WebhookNotFoundException(webhookId));

        String payload = serializeEvent(event);
        String timestamp = String.valueOf(Instant.now().getEpochSecond());
        String signature = computeSignature(payload, timestamp, webhook.secret());

        log.info("投递WebHook webhookId={}, eventType={}, attempt={}",
            webhookId, event.eventType(), 1);

        webClientBuilder
            .baseUrl(webhook.targetUrl())
            .build()
            .post()
            .headers(headers -> {
                // 标准WebHook安全Header
                headers.set("Content-Type", "application/json");
                headers.set("X-Webhook-Id", webhookId);
                headers.set("X-Webhook-Timestamp", timestamp);
                headers.set("X-Webhook-Signature", "v1=" + signature);
                headers.set("X-Webhook-Event", event.eventType());
                // 幂等Key：相同事件重试不会重复处理
                headers.set("X-Idempotency-Key", event.idempotencyKey());
            })
            .bodyValue(payload)
            .retrieve()
            .toBodilessEntity()
            .retryWhen(buildRetrySpec())
            .subscribe(
                resp -> log.info("WebHook投递成功 webhookId={}, status={}",
                    webhookId, resp.getStatusCode()),
                err -> log.error("WebHook投递最终失败 webhookId={}", webhookId, err)
            );
    }

    /**
     * 计算HMAC-SHA256签名
     * 客户端验证方式：signature = HMAC-SHA256(secret, timestamp + "." + payload)
     */
    private String computeSignature(String payload, String timestamp, String secret) {
        try {
            String message = timestamp + "." + payload;
            Mac mac = Mac.getInstance("HmacSHA256");
            SecretKeySpec keySpec = new SecretKeySpec(
                secret.getBytes(StandardCharsets.UTF_8), "HmacSHA256");
            mac.init(keySpec);
            byte[] signature = mac.doFinal(
                message.getBytes(StandardCharsets.UTF_8));
            return Base64.getEncoder().encodeToString(signature);
        } catch (NoSuchAlgorithmException | InvalidKeyException e) {
            throw new RuntimeException("签名计算失败", e);
        }
    }

    /**
     * 指数退避重试：1分钟、5分钟、30分钟、2小时、8小时
     */
    private reactor.util.retry.Retry buildRetrySpec() {
        return reactor.util.retry.Retry
            .backoff(5, java.time.Duration.ofMinutes(1))
            .maxBackoff(java.time.Duration.ofHours(8))
            .jitter(0.1);
    }

    private String serializeEvent(WebhookEvent event) {
        // JSON序列化...
        return "{}";
    }
}

4.3 WebHook事件格式

// WebhookEvent.java
package com.laozhang.api.model;

import io.swagger.v3.oas.annotations.media.Schema;

@Schema(description = "WebHook事件载荷")
public record WebhookEvent(

    @Schema(description = "WebHook ID") String webhookId,

    @Schema(
        description = "事件类型",
        allowableValues = {
            "task.completed",
            "task.failed",
            "task.cancelled",
            "webhook.verification"
        }
    )
    String eventType,

    @Schema(description = "事件触发时间（ISO 8601）")
    String timestamp,

    @Schema(description = "幂等Key，相同事件重发时Key不变，客户端去重用")
    String idempotencyKey,

    @Schema(description = "API版本") String apiVersion,

    @Schema(description = "事件数据") Object data
) {}

五、OpenAPI 3.0文档：AI接口的标准化描述

5.1 OpenAPI YAML示例（关键部分）

# openapi.yaml
openapi: 3.0.3
info:
  title: AI文档分析平台 API
  description: |
    ## 概述
    提供AI驱动的文档分析、摘要生成、内容提取等功能。

    ## 认证方式
    所有接口需要Bearer Token认证：
    ```
    Authorization: Bearer your_api_key
    ```

    ## 限流规则
    - 标准版：60次/分钟，10000次/天
    - 专业版：300次/分钟，100000次/天
    - 超出限制返回429状态码，Header中包含重置时间

    ## 流式接口说明
    流式接口（/generate/stream）使用Server-Sent Events（SSE）格式，
    需要客户端处理`text/event-stream`响应类型。

  version: "2.0.0"
  contact:
    name: API Support
    email: api-support@example.com

servers:
  - url: https://api.example.com/api/v2
    description: 生产环境
  - url: https://sandbox.api.example.com/api/v2
    description: 沙箱环境

paths:
  /ai/generate/stream:
    post:
      summary: 流式AI文本生成
      operationId: generateStream
      tags: [AI生成]
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/StreamGenerateRequest'
            examples:
              basicChat:
                summary: 基础对话示例
                value:
                  prompt: "帮我解释Java虚拟线程的原理"
                  model: "gpt-4o"
                  maxTokens: 1024
                  temperature: 0.7
      responses:
        '200':
          description: 流式内容（SSE格式）
          content:
            text/event-stream:
              schema:
                type: string
              example: |
                event: chunk
                data: {"type":"chunk","content":"Java虚","index":0}

                event: done
                data: {"type":"done","usage":{"inputTokens":45,"outputTokens":892}}
        '400':
          $ref: '#/components/responses/BadRequest'
        '429':
          $ref: '#/components/responses/RateLimitExceeded'

components:
  schemas:
    StreamGenerateRequest:
      type: object
      required: [prompt]
      properties:
        prompt:
          type: string
          maxLength: 32000
          description: 用户提示词
          example: "帮我分析Java虚拟线程的优缺点"
        model:
          type: string
          enum: [gpt-4o, gpt-4o-mini, claude-3-5-sonnet]
          default: gpt-4o
          description: AI模型选择
        maxTokens:
          type: integer
          minimum: 1
          maximum: 4096
          default: 1024
          description: 最大生成Token数
        temperature:
          type: number
          minimum: 0
          maximum: 2
          default: 0.7
          description: 随机性控制

    AiError:
      type: object
      required: [code, message, requestId]
      properties:
        code:
          type: string
          description: 错误码（机器可读）
          example: "TOKEN_LIMIT_EXCEEDED"
        message:
          type: string
          description: 错误说明（人类可读）
          example: "输入文本超过模型最大Token限制(32000个Token)"
        requestId:
          type: string
          description: 请求ID，用于客服排查
          example: "req_2x9k3m4n5p"
        details:
          type: object
          description: 错误详情（可选）

  responses:
    BadRequest:
      description: 请求参数错误
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/AiError'

    RateLimitExceeded:
      description: 超出限流配额
      headers:
        X-RateLimit-Limit:
          schema:
            type: integer
          description: 窗口内请求总额度
        X-RateLimit-Remaining:
          schema:
            type: integer
          description: 当前窗口剩余额度
        X-RateLimit-Reset:
          schema:
            type: integer
          description: 配额重置时间（Unix时间戳）
        Retry-After:
          schema:
            type: integer
          description: 建议重试等待时间（秒）
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/AiError'

  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: JWT

security:
  - BearerAuth: []

5.2 Springdoc OpenAPI集成

// OpenApiConfig.java
package com.laozhang.api.config;

import io.swagger.v3.oas.models.*;
import io.swagger.v3.oas.models.info.Info;
import io.swagger.v3.oas.models.security.SecurityScheme;
import io.swagger.v3.oas.models.servers.Server;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import java.util.List;

@Configuration
public class OpenApiConfig {

    @Bean
    public OpenAPI customOpenAPI() {
        return new OpenAPI()
            .info(new Info()
                .title("AI文档分析平台 API")
                .version("2.0.0")
                .description("""
                    ## AI API设计规范
                    本API遵循以下设计原则：
                    1. **流式优先**：支持SSE实时推送
                    2. **异步友好**：长任务使用轮询/WebHook
                    3. **语义化错误**：AI特有错误码清晰可读
                    4. **客户端友好**：Retry-After引导重试
                    """)
            )
            .servers(List.of(
                new Server().url("https://api.example.com").description("生产环境"),
                new Server().url("http://localhost:8080").description("本地开发")
            ))
            .components(new Components()
                .addSecuritySchemes("BearerAuth",
                    new SecurityScheme()
                        .type(SecurityScheme.Type.HTTP)
                        .scheme("bearer")
                        .bearerFormat("API Key")));
    }
}

六、版本管理：AI API的版本策略

6.1 版本策略对比

策略	URL路径	Header	优缺点
URL路径	`/api/v1/generate`	-	直观、易缓存，URL变化大
Accept Header	-	`Accept: application/vnd.api.v2+json`	优雅，但调试麻烦
自定义Header	-	`API-Version: 2`	折中方案

AI API推荐策略：URL路径 + Header双支持

// ApiVersionConfig.java
package com.laozhang.api.config;

import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.servlet.config.annotation.PathMatchConfigurer;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;

/**
 * API版本策略：
 * 1. 主版本号（v1/v2）：有破坏性变更时升级
 * 2. 副版本号（通过Header X-API-Minor-Version）：新增字段等兼容变更
 * 3. 旧版本保留12个月后下线，提前3个月发送Deprecation Header警告
 */
public class ApiVersionConfig implements WebMvcConfigurer {

    @Override
    public void configurePathMatch(PathMatchConfigurer configurer) {
        // 自动为所有Controller添加/api/v{version}前缀
        configurer.addPathPrefix("/api/v2",
            cls -> cls.isAnnotationPresent(RequestMapping.class)
                && cls.getPackageName().startsWith("com.laozhang.api.v2"));
    }
}

6.2 API废弃警告中间件

// DeprecationInterceptor.java
package com.laozhang.api.interceptor;

import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.springframework.web.servlet.HandlerInterceptor;

/**
 * 在废弃的API响应中注入警告Header
 * 客户端可以据此提前做迁移
 */
public class DeprecationInterceptor implements HandlerInterceptor {

    @Override
    public boolean preHandle(HttpServletRequest request,
                              HttpServletResponse response,
                              Object handler) {

        String uri = request.getRequestURI();

        // v1 API将于2027-03-01下线
        if (uri.startsWith("/api/v1/")) {
            response.setHeader("Deprecation",
                "Sat, 01 Mar 2027 00:00:00 GMT");
            response.setHeader("Sunset",
                "Sat, 01 Mar 2027 00:00:00 GMT");
            response.setHeader("Link",
                "<https://docs.example.com/migration/v1-to-v2>; rel=\"deprecation\"");
            response.setHeader("Warning",
                "299 - \"v1 API将于2027-03-01废弃，请迁移至v2\"");
        }

        return true;
    }
}

七、错误码设计：AI特有错误的标准化

7.1 AI错误码体系

// AiErrorCode.java
package com.laozhang.api.error;

import org.springframework.http.HttpStatus;

public enum AiErrorCode {

    // ===== 通用错误（400x） =====
    INVALID_PARAMETER(400, "INVALID_PARAMETER", "请求参数无效"),
    MISSING_REQUIRED_FIELD(400, "MISSING_REQUIRED_FIELD", "缺少必填字段"),

    // ===== AI特有输入错误（401x） =====
    TOKEN_LIMIT_EXCEEDED(400, "TOKEN_LIMIT_EXCEEDED",
        "输入文本超过模型最大Token限制"),
    UNSUPPORTED_LANGUAGE(400, "UNSUPPORTED_LANGUAGE",
        "不支持的语言，当前支持：中文、英文、日文"),
    INVALID_FILE_FORMAT(400, "INVALID_FILE_FORMAT",
        "不支持的文件格式，支持：PDF/Word/TXT/MD"),
    FILE_TOO_LARGE(400, "FILE_TOO_LARGE",
        "文件超过大小限制（最大50MB）"),

    // ===== 认证/授权错误（403x） =====
    INVALID_API_KEY(401, "INVALID_API_KEY", "API Key无效或已过期"),
    QUOTA_EXHAUSTED(403, "QUOTA_EXHAUSTED",
        "账户Token配额已用尽，请充值或等待配额重置"),
    FEATURE_NOT_AVAILABLE(403, "FEATURE_NOT_AVAILABLE",
        "当前套餐不支持此功能"),

    // ===== AI安全/内容错误（411x） =====
    CONTENT_POLICY_VIOLATION(400, "CONTENT_POLICY_VIOLATION",
        "输入内容违反内容安全策略"),
    OUTPUT_FILTERED(200, "OUTPUT_FILTERED",
        "生成内容被安全过滤，已返回安全版本"),
    PII_DETECTED(400, "PII_DETECTED",
        "检测到个人敏感信息（PII），请脱敏后重试"),

    // ===== 限流错误（429x） =====
    RATE_LIMIT_EXCEEDED(429, "RATE_LIMIT_EXCEEDED",
        "请求频率超过限制"),
    CONCURRENT_LIMIT_EXCEEDED(429, "CONCURRENT_LIMIT_EXCEEDED",
        "并发请求数超过限制"),

    // ===== AI模型错误（500x） =====
    MODEL_UNAVAILABLE(503, "MODEL_UNAVAILABLE",
        "AI模型服务暂时不可用，请稍后重试"),
    MODEL_TIMEOUT(504, "MODEL_TIMEOUT",
        "AI模型响应超时，请减少输入长度后重试"),
    GENERATION_FAILED(500, "GENERATION_FAILED",
        "AI生成失败，请重试"),

    // ===== 任务相关错误 =====
    TASK_NOT_FOUND(404, "TASK_NOT_FOUND", "任务不存在"),
    TASK_NOT_COMPLETED(409, "TASK_NOT_COMPLETED",
        "任务尚未完成，无法获取结果"),
    TASK_ALREADY_CANCELLED(409, "TASK_ALREADY_CANCELLED",
        "任务已取消"),
    TASK_EXPIRED(410, "TASK_EXPIRED",
        "任务结果已过期（72小时），请重新提交");

    private final int httpStatus;
    private final String code;
    private final String defaultMessage;

    AiErrorCode(int httpStatus, String code, String defaultMessage) {
        this.httpStatus = httpStatus;
        this.code = code;
        this.defaultMessage = defaultMessage;
    }

    public HttpStatus toHttpStatus() {
        return HttpStatus.valueOf(httpStatus);
    }

    public String getCode() { return code; }
    public String getDefaultMessage() { return defaultMessage; }
}

7.2 统一异常处理器

// GlobalExceptionHandler.java
package com.laozhang.api.exception;

import com.laozhang.api.error.AiErrorCode;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.ResponseEntity;
import org.springframework.validation.FieldError;
import org.springframework.web.bind.MethodArgumentNotValidException;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.time.Instant;
import java.util.UUID;
import java.util.stream.Collectors;

@RestControllerAdvice
@Slf4j
public class GlobalExceptionHandler {

    @ExceptionHandler(AiApiException.class)
    public ResponseEntity<ErrorResponse> handleAiException(AiApiException ex) {
        log.warn("AI API异常: code={}, message={}", ex.getErrorCode(), ex.getMessage());

        ErrorResponse errorResponse = new ErrorResponse(
            ex.getErrorCode().getCode(),
            ex.getMessage() != null ? ex.getMessage()
                : ex.getErrorCode().getDefaultMessage(),
            generateRequestId(),
            Instant.now().toString(),
            ex.getDetails()
        );

        return ResponseEntity
            .status(ex.getErrorCode().toHttpStatus())
            .body(errorResponse);
    }

    @ExceptionHandler(MethodArgumentNotValidException.class)
    public ResponseEntity<ErrorResponse> handleValidation(
            MethodArgumentNotValidException ex) {

        String details = ex.getBindingResult().getFieldErrors()
            .stream()
            .map(FieldError::getDefaultMessage)
            .collect(Collectors.joining("; "));

        ErrorResponse errorResponse = new ErrorResponse(
            AiErrorCode.INVALID_PARAMETER.getCode(),
            "请求参数验证失败: " + details,
            generateRequestId(),
            Instant.now().toString(),
            null
        );

        return ResponseEntity.badRequest().body(errorResponse);
    }

    private String generateRequestId() {
        return "req_" + UUID.randomUUID().toString().replace("-", "").substring(0, 16);
    }
}

// ErrorResponse.java
record ErrorResponse(
    String code,
    String message,
    String requestId,
    String timestamp,
    Object details
) {}

八、限流设计：Token桶 + 滑动窗口

8.1 双层限流策略

// RateLimitFilter.java
package com.laozhang.api.filter;

import com.laozhang.api.error.AiErrorCode;
import io.github.bucket4j.*;
import jakarta.servlet.*;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.core.annotation.Order;
import org.springframework.stereotype.Component;

import java.io.IOException;
import java.time.Duration;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

@Component
@Order(2)
@Slf4j
@RequiredArgsConstructor
public class RateLimitFilter implements Filter {

    // 每个API Key的Token桶（内存存储，生产环境换Redis）
    private final ConcurrentMap<String, Bucket> buckets = new ConcurrentHashMap<>();

    // 标准版限流配置：60次/分钟
    private static final long STANDARD_REQUESTS_PER_MINUTE = 60L;
    private static final long STANDARD_REQUESTS_PER_DAY = 10000L;

    @Override
    public void doFilter(ServletRequest req, ServletResponse resp,
                          FilterChain chain) throws IOException, ServletException {

        HttpServletRequest request = (HttpServletRequest) req;
        HttpServletResponse response = (HttpServletResponse) resp;

        String apiKey = extractApiKey(request);
        if (apiKey == null) {
            chain.doFilter(req, resp);
            return;
        }

        Bucket bucket = getOrCreateBucket(apiKey);

        // 尝试消耗1个Token
        ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);

        if (probe.isConsumed()) {
            // 添加限流信息Header
            response.setHeader("X-RateLimit-Limit",
                String.valueOf(STANDARD_REQUESTS_PER_MINUTE));
            response.setHeader("X-RateLimit-Remaining",
                String.valueOf(probe.getRemainingTokens()));
            response.setHeader("X-RateLimit-Reset",
                String.valueOf(System.currentTimeMillis() / 1000 + 60));

            chain.doFilter(req, resp);
        } else {
            // 限流：返回429
            long waitForRefillNanos = probe.getNanosToWaitForRefill();
            long retryAfterSeconds = (waitForRefillNanos / 1_000_000_000L) + 1;

            response.setStatus(429);
            response.setHeader("Content-Type", "application/json");
            response.setHeader("Retry-After", String.valueOf(retryAfterSeconds));
            response.setHeader("X-RateLimit-Limit",
                String.valueOf(STANDARD_REQUESTS_PER_MINUTE));
            response.setHeader("X-RateLimit-Remaining", "0");

            response.getWriter().write("""
                {
                  "code": "RATE_LIMIT_EXCEEDED",
                  "message": "请求频率超过限制（60次/分钟），请 %d 秒后重试",
                  "retryAfterSeconds": %d
                }
                """.formatted(retryAfterSeconds, retryAfterSeconds));
        }
    }

    private Bucket getOrCreateBucket(String apiKey) {
        return buckets.computeIfAbsent(apiKey, key -> {
            // 双层限流：分钟级 + 天级
            return Bucket.builder()
                // 层1：每分钟60次（防突发）
                .addLimit(Bandwidth.classic(
                    STANDARD_REQUESTS_PER_MINUTE,
                    Refill.intervally(STANDARD_REQUESTS_PER_MINUTE,
                        Duration.ofMinutes(1))
                ))
                // 层2：每天10000次（防滥用）
                .addLimit(Bandwidth.classic(
                    STANDARD_REQUESTS_PER_DAY,
                    Refill.intervally(STANDARD_REQUESTS_PER_DAY,
                        Duration.ofDays(1))
                ))
                .build();
        });
    }

    private String extractApiKey(HttpServletRequest request) {
        String auth = request.getHeader("Authorization");
        if (auth != null && auth.startsWith("Bearer ")) {
            return auth.substring(7);
        }
        return null;
    }
}

九、SDK生成：从OpenAPI自动生成Java客户端

9.1 Maven插件配置（自动生成SDK）

<!-- pom.xml（SDK生成器） -->
<plugin>
    <groupId>org.openapitools</groupId>
    <artifactId>openapi-generator-maven-plugin</artifactId>
    <version>7.8.0</version>
    <executions>
        <execution>
            <goals>
                <goal>generate</goal>
            </goals>
            <configuration>
                <!-- OpenAPI规范文件路径 -->
                <inputSpec>
                    ${project.basedir}/src/main/resources/openapi.yaml
                </inputSpec>

                <!-- 生成Java客户端 -->
                <generatorName>java</generatorName>

                <!-- 输出目录 -->
                <output>${project.build.directory}/generated-sources/openapi</output>

                <!-- 客户端配置 -->
                <configOptions>
                    <library>resttemplate</library>
                    <apiPackage>com.laozhang.sdk.api</apiPackage>
                    <modelPackage>com.laozhang.sdk.model</modelPackage>
                    <invokerPackage>com.laozhang.sdk</invokerPackage>
                    <dateLibrary>java8</dateLibrary>
                    <useOptional>true</useOptional>
                    <generateBuilders>true</generateBuilders>
                    <serializationLibrary>jackson</serializationLibrary>
                </configOptions>
            </configuration>
        </execution>
    </executions>
</plugin>

9.2 生成的SDK使用示例

// SDK使用示例（自动生成的客户端）
package com.example.demo;

import com.laozhang.sdk.ApiClient;
import com.laozhang.sdk.api.AiGenerationApi;
import com.laozhang.sdk.model.StreamGenerateRequest;
import com.laozhang.sdk.model.TaskStatusResponse;

public class SdkUsageExample {

    public static void main(String[] args) throws Exception {
        // 初始化客户端
        ApiClient client = new ApiClient();
        client.setBasePath("https://api.example.com/api/v2");
        client.setBearerToken("your_api_key_here");

        AiGenerationApi api = new AiGenerationApi(client);

        // 提交异步任务
        var request = new StreamGenerateRequest()
            .prompt("帮我分析这个合同的风险条款")
            .model("gpt-4o")
            .maxTokens(2048);

        var submitResponse = api.submitTask(request);
        String taskId = submitResponse.getTaskId();

        System.out.println("任务已提交，taskId: " + taskId);
        System.out.println("建议 " + submitResponse.getEstimatedWaitSeconds() + " 秒后查询状态");

        // 轮询任务状态（指数退避）
        int attempt = 0;
        while (true) {
            Thread.sleep(calculateDelay(attempt++) * 1000L);

            TaskStatusResponse status = api.getTaskStatus(taskId);
            System.out.printf("任务状态: %s，进度: %d%%\n",
                status.getStatus(), status.getProgress());

            if (status.isCompleted()) {
                var result = api.getTaskResult(taskId);
                System.out.println("分析结果: " + result.getContent());
                break;
            }

            if ("failed".equals(status.getStatus())) {
                System.err.println("任务失败: " + status.getFailureReason());
                break;
            }
        }
    }

    private static long calculateDelay(int attempt) {
        return Math.min(60, (long) Math.pow(2, attempt));
    }
}

十、性能数据与最佳实践总结

10.1 不同API模式的性能对比

指标	同步API	SSE流式	异步轮询	WebHook
首字节延迟	N/A	100~500ms	N/A	N/A
完整响应时间	1~30s	1~30s（流式体验好）	不阻塞	不阻塞
服务器并发连接	1倍	1倍	0.01倍（轮询间隔长）	0（主动推送）
客户端复杂度	简单	中等	中等	复杂（需要公网端点）
适合场景	<3s响应	实时显示生成过程	3s~3min	>3min或批量

10.2 重新设计后的效果数据

王芳团队重构API后，统计了3个月的数据：

指标	重构前	重构后
外部开发者平均集成时间	2天	2小时
开发者支持工单数量	47个/月	8个/月
API调用错误率	12%	2.3%
开发者满意度评分	3.2/5	4.6/5
SDK下载量	N/A	1200次/月

十一、FAQ

Q1：流式接口和非流式接口需要单独维护吗？

不需要。Spring AI的ChatClient支持直接转换为Flux<String>（流式）或.content()（非流式），可以共用同一套业务逻辑，在Controller层根据请求参数stream=true/false分叉响应类型。

Q2：SSE连接断开了怎么处理？

SSE协议内置断点续传机制——客户端收到每条消息的id字段，断线重连时会携带Last-Event-ID Header。服务端根据这个ID重新发送未送达的消息。在Spring WebFlux中用ServerSentEvent.id()设置消息ID即可。

Q3：WebHook目标URL不可达，会丢消息吗？

正确实现的WebHook系统不会丢消息。推荐用指数退避重试（1分钟→5分钟→30分钟→2小时→8小时），总重试5次。如果全部失败，消息持久化到死信队列，运维人员手动重试。

Q4：API限流用Redis还是内存？

开发/单机：内存（Bucket4j）生产多实例：Redis（使用Bucket4j + Redis后端，或Spring Security的Redis限流），多实例共享同一个计数器，否则每个实例独立计算限流会导致实际放行量是配置值的N倍。

Q5：OpenAPI文档如何保证和代码一致？

推荐Code First方式：用Spring MVC注解 + @Operation/@Schema注解生成OpenAPI文档，Springdoc自动扫描生成YAML。避免手写YAML和代码不同步的问题。CI流水线加上openapi-diff对比新旧版本是否有破坏性变更。

Q6：AI错误码和HTTP状态码如何对应？

AI特有错误一般都是可描述的4xx错误（参数问题/配额问题/内容过滤）。模型不可用返回503，模型超时返回504。500只保留给真正的意外异常。注意：内容被过滤时，要用业务层error code区分，不要用500（它实际上是正常功能）。

总结

好的AI API设计，核心是以开发者体验为核心：

流式优先：SSE让用户感受到AI在"思考"，体验远优于等待
异步透明：清晰的任务状态机 + 合理的Retry-After，不让开发者猜
错误语义化：AI特有错误码让开发者知道该怎么处理
自动文档：OpenAPI + 自动生成SDK，降低集成门槛
限流友好：告诉开发者"还能用多少"，而不是突然报错

从2天集成到2小时，这就是好API设计的价值。