Spring AI + Spring Security:AI接口鉴权与请求限流实战
Spring AI + Spring Security:AI接口鉴权与请求限流实战
适读人群:有Spring Boot开发经验、正在或即将上线AI功能的Java工程师 阅读时长:约18分钟
老王的AI接口被刷爆了
上个月,老王兴冲冲地把公司内部的AI问答系统上线了。前三天用得挺好,第四天早上他打开监控,脸色就变了。
OpenAI账单里有一笔2000美元的异常消费,追查下来,是一个前端同事不小心把API地址发到了技术群里,然后被几个人写脚本疯狂调用。没有鉴权、没有限流、没有任何防护——AI接口就这样裸奔在公网上,直到账户余额见底才自动停了。
老王找到我,说:"我以前做普通REST API,顶多是被白嫖一点服务器资源,但AI接口这个……每次调用都是真金白银啊。"
这话说得很对。AI接口的安全保护,和传统接口不是一个量级的问题。 一个没防护的AI接口,可能一夜之间让你倾家荡产。
今天我就把我们团队这套鉴权+限流方案完整拆给你看。
为什么AI接口需要特殊对待
传统REST API被滥用,最多是服务器CPU飙高、带宽被吃满,成本是相对固定的。
但AI接口不一样:
| 风险类型 | 传统API | AI接口 |
|---|---|---|
| 被恶意调用的直接成本 | 服务器资源 | Token费用(直接货币损失) |
| 单次调用耗时 | 毫秒级 | 秒到分钟级 |
| 响应体大小 | 通常较小 | 大量文本,流式输出 |
| 滥用难以发现 | 流量监控可发现 | 可能潜伏很久才看账单 |
| 限流策略复杂度 | 按请求数 | 需要按Token数限流 |
所以对AI接口,我们需要:
- 强制身份认证——不认识的人一律拦在门外
- 细粒度权限控制——不同用户调用不同模型
- 双维度限流——按请求数 + 按Token消耗量
- 完整的审计日志——事后可追溯
整体架构设计
核心流程分三层:鉴权层确认身份、限流层控制用量、监控层记录消耗。
动手实现:从零搭建
第一步:依赖配置
<dependencies>
<!-- Spring AI -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Spring Security -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-security</artifactId>
</dependency>
<!-- Redis for 限流 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<!-- Resilience4j 限流 -->
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot3</artifactId>
</dependency>
<!-- AOP for 审计 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>
</dependencies>第二步:API Key 数据模型
@Entity
@Table(name = "api_keys")
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class ApiKey {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(unique = true, nullable = false)
private String keyValue; // 实际的key字符串,存储hash值
@Column(nullable = false)
private String keyName; // key的别名,便于管理
@Column(nullable = false)
private String ownerId; // 所属用户/应用ID
@Enumerated(EnumType.STRING)
private KeyStatus status; // ACTIVE / REVOKED / EXPIRED
// 请求级别限流配置
private Integer rpmLimit; // 每分钟最大请求数 (Requests Per Minute)
private Integer rpdLimit; // 每天最大请求数 (Requests Per Day)
// Token级别限流配置
private Integer tpmLimit; // 每分钟最大Token数 (Tokens Per Minute)
private Long tpdLimit; // 每天最大Token数 (Tokens Per Day)
// 权限配置
@ElementCollection
@CollectionTable(name = "api_key_permissions")
private Set<String> allowedModels; // 允许调用的模型列表
@Column(nullable = false)
private LocalDateTime createdAt;
private LocalDateTime expiresAt; // null表示永不过期
private LocalDateTime lastUsedAt;
public enum KeyStatus {
ACTIVE, REVOKED, EXPIRED
}
public boolean isValid() {
if (status != KeyStatus.ACTIVE) return false;
if (expiresAt != null && LocalDateTime.now().isAfter(expiresAt)) return false;
return true;
}
}第三步:Spring Security 配置
这是整个方案的核心,我们用自定义Filter来处理API Key认证,不走传统的用户名密码体系。
@Configuration
@EnableWebSecurity
@EnableMethodSecurity
@RequiredArgsConstructor
public class SecurityConfig {
private final ApiKeyAuthFilter apiKeyAuthFilter;
@Bean
public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
http
// AI接口是无状态的,禁用Session
.sessionManagement(session ->
session.sessionCreationPolicy(SessionCreationPolicy.STATELESS))
// 禁用CSRF(API接口不需要)
.csrf(AbstractHttpConfigurer::disable)
// 配置访问规则
.authorizeHttpRequests(auth -> auth
// 健康检查接口公开
.requestMatchers("/actuator/health").permitAll()
// 文档接口公开
.requestMatchers("/swagger-ui/**", "/v3/api-docs/**").permitAll()
// 所有AI接口需要认证
.requestMatchers("/api/ai/**").authenticated()
// 管理接口需要ADMIN角色
.requestMatchers("/api/admin/**").hasRole("ADMIN")
.anyRequest().authenticated()
)
// 在UsernamePasswordAuthenticationFilter之前插入API Key过滤器
.addFilterBefore(apiKeyAuthFilter, UsernamePasswordAuthenticationFilter.class)
// 自定义异常处理
.exceptionHandling(ex -> ex
.authenticationEntryPoint(new ApiKeyAuthenticationEntryPoint())
.accessDeniedHandler(new ApiKeyAccessDeniedHandler())
);
return http.build();
}
@Bean
public PasswordEncoder passwordEncoder() {
return new BCryptPasswordEncoder();
}
}
/**
* API Key认证过滤器
* 从请求头中提取并验证API Key
*/
@Component
@RequiredArgsConstructor
@Slf4j
public class ApiKeyAuthFilter extends OncePerRequestFilter {
private static final String API_KEY_HEADER = "X-API-Key";
private static final String BEARER_PREFIX = "Bearer ";
private final ApiKeyService apiKeyService;
private final RateLimiterService rateLimiterService;
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain) throws ServletException, IOException {
String apiKey = extractApiKey(request);
if (apiKey == null) {
filterChain.doFilter(request, response);
return;
}
try {
// 验证API Key
ApiKeyPrincipal principal = apiKeyService.validateAndLoadKey(apiKey);
// 检查请求级别限流
RateLimitResult rateLimitResult = rateLimiterService.checkRequestLimit(principal);
if (!rateLimitResult.isAllowed()) {
writeRateLimitResponse(response, rateLimitResult);
return;
}
// 将认证信息存入SecurityContext
ApiKeyAuthToken authToken = new ApiKeyAuthToken(principal, principal.getAuthorities());
authToken.setDetails(new WebAuthenticationDetailsSource().buildDetails(request));
SecurityContextHolder.getContext().setAuthentication(authToken);
log.debug("API Key认证成功: keyName={}, owner={}",
principal.getKeyName(), principal.getOwnerId());
} catch (InvalidApiKeyException e) {
log.warn("无效的API Key: {}", maskKey(apiKey));
SecurityContextHolder.clearContext();
writeUnauthorizedResponse(response, e.getMessage());
return;
}
filterChain.doFilter(request, response);
}
private String extractApiKey(HttpServletRequest request) {
// 优先从Header提取
String headerValue = request.getHeader(API_KEY_HEADER);
if (StringUtils.hasText(headerValue)) {
return headerValue.trim();
}
// 也支持Bearer Token格式
String authHeader = request.getHeader(HttpHeaders.AUTHORIZATION);
if (StringUtils.hasText(authHeader) && authHeader.startsWith(BEARER_PREFIX)) {
return authHeader.substring(BEARER_PREFIX.length()).trim();
}
return null;
}
private String maskKey(String key) {
if (key == null || key.length() < 8) return "****";
return key.substring(0, 4) + "****" + key.substring(key.length() - 4);
}
private void writeRateLimitResponse(HttpServletResponse response,
RateLimitResult result) throws IOException {
response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
response.setContentType(MediaType.APPLICATION_JSON_VALUE);
response.setHeader("X-RateLimit-Limit", String.valueOf(result.getLimit()));
response.setHeader("X-RateLimit-Remaining", "0");
response.setHeader("X-RateLimit-Reset", String.valueOf(result.getResetTime()));
response.setHeader("Retry-After", String.valueOf(result.getRetryAfterSeconds()));
Map<String, Object> body = Map.of(
"error", "rate_limit_exceeded",
"message", result.getMessage(),
"retry_after", result.getRetryAfterSeconds()
);
response.getWriter().write(new ObjectMapper().writeValueAsString(body));
}
private void writeUnauthorizedResponse(HttpServletResponse response,
String message) throws IOException {
response.setStatus(HttpStatus.UNAUTHORIZED.value());
response.setContentType(MediaType.APPLICATION_JSON_VALUE);
Map<String, String> body = Map.of("error", "unauthorized", "message", message);
response.getWriter().write(new ObjectMapper().writeValueAsString(body));
}
}第四步:双维度限流实现
这是我踩坑最多的地方。最初只做了请求数限流,上线后发现有人专门发超长的Prompt,虽然请求数没超,但Token消耗是普通用户的几十倍。所以必须同时限制Token消耗。
@Service
@RequiredArgsConstructor
@Slf4j
public class RateLimiterService {
private final RedisTemplate<String, String> redisTemplate;
private final StringRedisTemplate stringRedisTemplate;
// Redis key前缀
private static final String RPM_KEY = "ratelimit:rpm:";
private static final String RPD_KEY = "ratelimit:rpd:";
private static final String TPM_KEY = "ratelimit:tpm:";
private static final String TPD_KEY = "ratelimit:tpd:";
/**
* 检查请求级别限流
* 先检后扣,原子操作保证准确性
*/
public RateLimitResult checkRequestLimit(ApiKeyPrincipal principal) {
String keyId = principal.getKeyId();
String minuteWindow = getCurrentMinuteWindow();
String dayWindow = getCurrentDayWindow();
// 检查每分钟请求数
if (principal.getRpmLimit() != null) {
String rpmKey = RPM_KEY + keyId + ":" + minuteWindow;
Long currentRpm = incrementWithExpiry(rpmKey, 60);
if (currentRpm > principal.getRpmLimit()) {
// 回滚计数
stringRedisTemplate.opsForValue().decrement(rpmKey);
return RateLimitResult.rejected(
"每分钟请求数超限",
principal.getRpmLimit(),
60 - LocalDateTime.now().getSecond()
);
}
}
// 检查每天请求数
if (principal.getRpdLimit() != null) {
String rpdKey = RPD_KEY + keyId + ":" + dayWindow;
Long currentRpd = incrementWithExpiry(rpdKey, 86400);
if (currentRpd > principal.getRpdLimit()) {
stringRedisTemplate.opsForValue().decrement(rpdKey);
return RateLimitResult.rejected(
"今日请求数已达上限",
principal.getRpdLimit(),
secondsUntilMidnight()
);
}
}
return RateLimitResult.allowed();
}
/**
* 上报Token消耗(在AI响应返回后调用)
*/
public void reportTokenUsage(String keyId, int promptTokens, int completionTokens) {
int totalTokens = promptTokens + completionTokens;
String minuteWindow = getCurrentMinuteWindow();
String dayWindow = getCurrentDayWindow();
// 更新TPM计数器
String tpmKey = TPM_KEY + keyId + ":" + minuteWindow;
incrementBy(tpmKey, totalTokens, 60);
// 更新TPD计数器
String tpdKey = TPD_KEY + keyId + ":" + dayWindow;
incrementBy(tpdKey, totalTokens, 86400);
log.debug("Token消耗上报: keyId={}, prompt={}, completion={}, total={}",
keyId, promptTokens, completionTokens, totalTokens);
}
/**
* 检查Token级别是否还有余量(在请求前预检)
*/
public RateLimitResult checkTokenBudget(ApiKeyPrincipal principal) {
String keyId = principal.getKeyId();
String minuteWindow = getCurrentMinuteWindow();
String dayWindow = getCurrentDayWindow();
if (principal.getTpmLimit() != null) {
String tpmKey = TPM_KEY + keyId + ":" + minuteWindow;
Long currentTpm = getCurrentCount(tpmKey);
if (currentTpm >= principal.getTpmLimit()) {
return RateLimitResult.rejected(
"每分钟Token消耗已达上限,请稍后再试",
principal.getTpmLimit(),
60 - LocalDateTime.now().getSecond()
);
}
}
if (principal.getTpdLimit() != null) {
String tpdKey = TPD_KEY + keyId + ":" + dayWindow;
Long currentTpd = getCurrentCount(tpdKey);
if (currentTpd >= principal.getTpdLimit()) {
return RateLimitResult.rejected(
"今日Token配额已耗尽",
principal.getTpdLimit(),
secondsUntilMidnight()
);
}
}
return RateLimitResult.allowed();
}
private Long incrementWithExpiry(String key, long ttlSeconds) {
// 使用Lua脚本保证原子性
String script = """
local current = redis.call('INCR', KEYS[1])
if current == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return current
""";
DefaultRedisScript<Long> redisScript = new DefaultRedisScript<>(script, Long.class);
return redisTemplate.execute(redisScript,
Collections.singletonList(key),
String.valueOf(ttlSeconds));
}
private void incrementBy(String key, int delta, long ttlSeconds) {
String script = """
local current = redis.call('INCRBY', KEYS[1], ARGV[1])
if current == tonumber(ARGV[1]) then
redis.call('EXPIRE', KEYS[1], ARGV[2])
end
return current
""";
DefaultRedisScript<Long> redisScript = new DefaultRedisScript<>(script, Long.class);
redisTemplate.execute(redisScript,
Collections.singletonList(key),
String.valueOf(delta), String.valueOf(ttlSeconds));
}
private Long getCurrentCount(String key) {
String value = stringRedisTemplate.opsForValue().get(key);
return value != null ? Long.parseLong(value) : 0L;
}
private String getCurrentMinuteWindow() {
return LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMddHHmm"));
}
private String getCurrentDayWindow() {
return LocalDate.now().format(DateTimeFormatter.ofPattern("yyyyMMdd"));
}
private long secondsUntilMidnight() {
LocalDateTime now = LocalDateTime.now();
LocalDateTime midnight = now.toLocalDate().plusDays(1).atStartOfDay();
return ChronoUnit.SECONDS.between(now, midnight);
}
}第五步:Spring AI 集成与Token监控
@RestController
@RequestMapping("/api/ai")
@RequiredArgsConstructor
@Slf4j
public class AiChatController {
private final ChatClient chatClient;
private final RateLimiterService rateLimiterService;
private final AuditLogService auditLogService;
@PostMapping("/chat")
@PreAuthorize("hasAuthority('MODEL:' + #request.model)") // 方法级权限控制
public ResponseEntity<ChatResponse> chat(
@RequestBody @Validated ChatRequest request,
@AuthenticationPrincipal ApiKeyPrincipal principal) {
// Token预算检查
RateLimitResult tokenBudget = rateLimiterService.checkTokenBudget(principal);
if (!tokenBudget.isAllowed()) {
return ResponseEntity.status(429)
.header("Retry-After", String.valueOf(tokenBudget.getRetryAfterSeconds()))
.body(ChatResponse.error(tokenBudget.getMessage()));
}
long startTime = System.currentTimeMillis();
try {
// 调用Spring AI
org.springframework.ai.chat.model.ChatResponse aiResponse = chatClient.prompt()
.system(buildSystemPrompt(principal)) // 注入系统级约束
.user(request.getMessage())
.options(ChatOptionsBuilder.builder()
.withModel(request.getModel())
.withMaxTokens(Math.min(request.getMaxTokens(), principal.getMaxTokensPerRequest()))
.build())
.call()
.chatResponse();
// 提取Token消耗统计
Usage usage = aiResponse.getMetadata().getUsage();
int promptTokens = (int) usage.getPromptTokens();
int completionTokens = (int) usage.getGenerationTokens();
// 上报Token消耗
rateLimiterService.reportTokenUsage(
principal.getKeyId(), promptTokens, completionTokens);
// 写入审计日志
auditLogService.recordSuccess(
principal.getKeyId(),
request.getModel(),
promptTokens,
completionTokens,
System.currentTimeMillis() - startTime
);
return ResponseEntity.ok(ChatResponse.success(
aiResponse.getResult().getOutput().getContent(),
usage
));
} catch (Exception e) {
auditLogService.recordFailure(principal.getKeyId(), request.getModel(), e.getMessage());
log.error("AI调用失败: keyId={}, error={}", principal.getKeyId(), e.getMessage());
throw e;
}
}
@GetMapping("/stream")
public Flux<ServerSentEvent<String>> streamChat(
@RequestParam String message,
@AuthenticationPrincipal ApiKeyPrincipal principal) {
// Token预算检查
RateLimitResult tokenBudget = rateLimiterService.checkTokenBudget(principal);
if (!tokenBudget.isAllowed()) {
return Flux.just(ServerSentEvent.<String>builder()
.event("error")
.data("{\"error\":\"" + tokenBudget.getMessage() + "\"}")
.build());
}
// 流式响应,token统计在完成时上报
AtomicInteger totalTokens = new AtomicInteger(0);
return chatClient.prompt()
.user(message)
.stream()
.chatResponse()
.map(response -> {
String content = response.getResult().getOutput().getContent();
// 估算token(流式模式下无法精确获取,用完成后的统计)
if (content != null) {
totalTokens.addAndGet(content.length() / 4); // 粗略估算
}
return ServerSentEvent.<String>builder()
.event("message")
.data(content != null ? content : "")
.build();
})
.doOnComplete(() -> {
// 流结束后上报(注意:流式模式下token统计是估算值)
rateLimiterService.reportTokenUsage(principal.getKeyId(), 0, totalTokens.get());
log.debug("流式请求完成: keyId={}, estimatedTokens={}",
principal.getKeyId(), totalTokens.get());
})
.onErrorResume(e -> {
log.error("流式请求失败: {}", e.getMessage());
return Flux.just(ServerSentEvent.<String>builder()
.event("error")
.data("{\"error\":\"Internal server error\"}")
.build());
});
}
private String buildSystemPrompt(ApiKeyPrincipal principal) {
// 根据用户权限注入系统级约束
return String.format("""
你是一个企业级AI助手。
当前用户: %s
权限级别: %s
请严格遵守企业数据安全规定,不得泄露任何内部信息。
""", principal.getOwnerId(), principal.getRole());
}
}关键配置项汇总
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o-mini
max-tokens: 2000
# 限流默认配置
app:
rate-limit:
default-rpm: 60
default-rpd: 1000
default-tpm: 40000
default-tpd: 500000
security:
api-key:
hash-algorithm: SHA-256 # key存储时使用hash
min-length: 32
# Redis配置
spring:
data:
redis:
host: ${REDIS_HOST:localhost}
port: 6379
timeout: 3000ms
lettuce:
pool:
max-active: 20
max-wait: 1000ms生产踩坑记录
坑1:Redis宕机时限流服务不可用
我们早期用硬性依赖Redis,结果Redis抖动时整个AI接口全挂了。后来改成降级策略:Redis不可用时,自动切换到本地内存限流器(Caffeine),牺牲部分精度,换取高可用。
坑2:流式响应Token统计不准
Spring AI的流式模式下,每个chunk不携带完整的usage信息。解决方案是在最后一个chunk中取统计值,或者用tiktoken4j在本地估算。
坑3:API Key泄露后的紧急处置
在管理接口里一定要加即时吊销功能,并且把已吊销的key缓存一份在Redis里,避免每次都查DB。
坑4:并发请求的计数器超扣
用INCR+EXPIRE的组合必须用Lua脚本保证原子性,否则在高并发下会出现"检查通过但计数超标"的情况,我在代码里已经给出了正确写法。
完整方案总结
这套方案在我们团队已经跑了半年,防住了多次扫描攻击和误操作。核心思路就一句话:AI接口的防护要从成本角度思考,而不仅仅是安全角度。
每一次未经授权的AI调用,都是在烧你的钱。
