Spring AI + Spring Security：AI接口鉴权与请求限流实战

老张2026/4/30大约 10 分钟

Spring AI + Spring Security：AI接口鉴权与请求限流实战

适读人群：有Spring Boot开发经验、正在或即将上线AI功能的Java工程师 阅读时长：约18分钟

老王的AI接口被刷爆了

上个月，老王兴冲冲地把公司内部的AI问答系统上线了。前三天用得挺好，第四天早上他打开监控，脸色就变了。

OpenAI账单里有一笔2000美元的异常消费，追查下来，是一个前端同事不小心把API地址发到了技术群里，然后被几个人写脚本疯狂调用。没有鉴权、没有限流、没有任何防护——AI接口就这样裸奔在公网上，直到账户余额见底才自动停了。

老王找到我，说："我以前做普通REST API，顶多是被白嫖一点服务器资源，但AI接口这个……每次调用都是真金白银啊。"

这话说得很对。AI接口的安全保护，和传统接口不是一个量级的问题。 一个没防护的AI接口，可能一夜之间让你倾家荡产。

今天我就把我们团队这套鉴权+限流方案完整拆给你看。

为什么AI接口需要特殊对待

传统REST API被滥用，最多是服务器CPU飙高、带宽被吃满，成本是相对固定的。

但AI接口不一样：

风险类型	传统API	AI接口
被恶意调用的直接成本	服务器资源	Token费用（直接货币损失）
单次调用耗时	毫秒级	秒到分钟级
响应体大小	通常较小	大量文本，流式输出
滥用难以发现	流量监控可发现	可能潜伏很久才看账单
限流策略复杂度	按请求数	需要按Token数限流

所以对AI接口，我们需要：

强制身份认证——不认识的人一律拦在门外
细粒度权限控制——不同用户调用不同模型
双维度限流——按请求数 + 按Token消耗量
完整的审计日志——事后可追溯

整体架构设计

核心流程分三层：鉴权层确认身份、限流层控制用量、监控层记录消耗。

动手实现：从零搭建

第一步：依赖配置

<dependencies>
    <!-- Spring AI -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    </dependency>
    
    <!-- Spring Security -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-security</artifactId>
    </dependency>
    
    <!-- Redis for 限流 -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-redis</artifactId>
    </dependency>
    
    <!-- Resilience4j 限流 -->
    <dependency>
        <groupId>io.github.resilience4j</groupId>
        <artifactId>resilience4j-spring-boot3</artifactId>
    </dependency>
    
    <!-- AOP for 审计 -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-aop</artifactId>
    </dependency>
</dependencies>

第二步：API Key 数据模型

@Entity
@Table(name = "api_keys")
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class ApiKey {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(unique = true, nullable = false)
    private String keyValue;       // 实际的key字符串，存储hash值
    
    @Column(nullable = false)
    private String keyName;        // key的别名，便于管理
    
    @Column(nullable = false)
    private String ownerId;        // 所属用户/应用ID
    
    @Enumerated(EnumType.STRING)
    private KeyStatus status;      // ACTIVE / REVOKED / EXPIRED
    
    // 请求级别限流配置
    private Integer rpmLimit;      // 每分钟最大请求数 (Requests Per Minute)
    private Integer rpdLimit;      // 每天最大请求数 (Requests Per Day)
    
    // Token级别限流配置
    private Integer tpmLimit;      // 每分钟最大Token数 (Tokens Per Minute)
    private Long tpdLimit;         // 每天最大Token数 (Tokens Per Day)
    
    // 权限配置
    @ElementCollection
    @CollectionTable(name = "api_key_permissions")
    private Set<String> allowedModels;   // 允许调用的模型列表
    
    @Column(nullable = false)
    private LocalDateTime createdAt;
    
    private LocalDateTime expiresAt;     // null表示永不过期
    
    private LocalDateTime lastUsedAt;
    
    public enum KeyStatus {
        ACTIVE, REVOKED, EXPIRED
    }
    
    public boolean isValid() {
        if (status != KeyStatus.ACTIVE) return false;
        if (expiresAt != null && LocalDateTime.now().isAfter(expiresAt)) return false;
        return true;
    }
}

第三步：Spring Security 配置

这是整个方案的核心，我们用自定义Filter来处理API Key认证，不走传统的用户名密码体系。

@Configuration
@EnableWebSecurity
@EnableMethodSecurity
@RequiredArgsConstructor
public class SecurityConfig {

    private final ApiKeyAuthFilter apiKeyAuthFilter;

    @Bean
    public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
        http
            // AI接口是无状态的，禁用Session
            .sessionManagement(session -> 
                session.sessionCreationPolicy(SessionCreationPolicy.STATELESS))
            // 禁用CSRF（API接口不需要）
            .csrf(AbstractHttpConfigurer::disable)
            // 配置访问规则
            .authorizeHttpRequests(auth -> auth
                // 健康检查接口公开
                .requestMatchers("/actuator/health").permitAll()
                // 文档接口公开
                .requestMatchers("/swagger-ui/**", "/v3/api-docs/**").permitAll()
                // 所有AI接口需要认证
                .requestMatchers("/api/ai/**").authenticated()
                // 管理接口需要ADMIN角色
                .requestMatchers("/api/admin/**").hasRole("ADMIN")
                .anyRequest().authenticated()
            )
            // 在UsernamePasswordAuthenticationFilter之前插入API Key过滤器
            .addFilterBefore(apiKeyAuthFilter, UsernamePasswordAuthenticationFilter.class)
            // 自定义异常处理
            .exceptionHandling(ex -> ex
                .authenticationEntryPoint(new ApiKeyAuthenticationEntryPoint())
                .accessDeniedHandler(new ApiKeyAccessDeniedHandler())
            );
        
        return http.build();
    }

    @Bean
    public PasswordEncoder passwordEncoder() {
        return new BCryptPasswordEncoder();
    }
}

/**
 * API Key认证过滤器
 * 从请求头中提取并验证API Key
 */
@Component
@RequiredArgsConstructor
@Slf4j
public class ApiKeyAuthFilter extends OncePerRequestFilter {

    private static final String API_KEY_HEADER = "X-API-Key";
    private static final String BEARER_PREFIX = "Bearer ";
    
    private final ApiKeyService apiKeyService;
    private final RateLimiterService rateLimiterService;

    @Override
    protected void doFilterInternal(HttpServletRequest request,
                                    HttpServletResponse response,
                                    FilterChain filterChain) throws ServletException, IOException {
        
        String apiKey = extractApiKey(request);
        
        if (apiKey == null) {
            filterChain.doFilter(request, response);
            return;
        }
        
        try {
            // 验证API Key
            ApiKeyPrincipal principal = apiKeyService.validateAndLoadKey(apiKey);
            
            // 检查请求级别限流
            RateLimitResult rateLimitResult = rateLimiterService.checkRequestLimit(principal);
            if (!rateLimitResult.isAllowed()) {
                writeRateLimitResponse(response, rateLimitResult);
                return;
            }
            
            // 将认证信息存入SecurityContext
            ApiKeyAuthToken authToken = new ApiKeyAuthToken(principal, principal.getAuthorities());
            authToken.setDetails(new WebAuthenticationDetailsSource().buildDetails(request));
            SecurityContextHolder.getContext().setAuthentication(authToken);
            
            log.debug("API Key认证成功: keyName={}, owner={}", 
                principal.getKeyName(), principal.getOwnerId());
            
        } catch (InvalidApiKeyException e) {
            log.warn("无效的API Key: {}", maskKey(apiKey));
            SecurityContextHolder.clearContext();
            writeUnauthorizedResponse(response, e.getMessage());
            return;
        }
        
        filterChain.doFilter(request, response);
    }

    private String extractApiKey(HttpServletRequest request) {
        // 优先从Header提取
        String headerValue = request.getHeader(API_KEY_HEADER);
        if (StringUtils.hasText(headerValue)) {
            return headerValue.trim();
        }
        
        // 也支持Bearer Token格式
        String authHeader = request.getHeader(HttpHeaders.AUTHORIZATION);
        if (StringUtils.hasText(authHeader) && authHeader.startsWith(BEARER_PREFIX)) {
            return authHeader.substring(BEARER_PREFIX.length()).trim();
        }
        
        return null;
    }
    
    private String maskKey(String key) {
        if (key == null || key.length() < 8) return "****";
        return key.substring(0, 4) + "****" + key.substring(key.length() - 4);
    }
    
    private void writeRateLimitResponse(HttpServletResponse response, 
                                         RateLimitResult result) throws IOException {
        response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
        response.setContentType(MediaType.APPLICATION_JSON_VALUE);
        response.setHeader("X-RateLimit-Limit", String.valueOf(result.getLimit()));
        response.setHeader("X-RateLimit-Remaining", "0");
        response.setHeader("X-RateLimit-Reset", String.valueOf(result.getResetTime()));
        response.setHeader("Retry-After", String.valueOf(result.getRetryAfterSeconds()));
        
        Map<String, Object> body = Map.of(
            "error", "rate_limit_exceeded",
            "message", result.getMessage(),
            "retry_after", result.getRetryAfterSeconds()
        );
        response.getWriter().write(new ObjectMapper().writeValueAsString(body));
    }
    
    private void writeUnauthorizedResponse(HttpServletResponse response, 
                                            String message) throws IOException {
        response.setStatus(HttpStatus.UNAUTHORIZED.value());
        response.setContentType(MediaType.APPLICATION_JSON_VALUE);
        Map<String, String> body = Map.of("error", "unauthorized", "message", message);
        response.getWriter().write(new ObjectMapper().writeValueAsString(body));
    }
}

第四步：双维度限流实现

这是我踩坑最多的地方。最初只做了请求数限流，上线后发现有人专门发超长的Prompt，虽然请求数没超，但Token消耗是普通用户的几十倍。所以必须同时限制Token消耗。

@Service
@RequiredArgsConstructor
@Slf4j
public class RateLimiterService {

    private final RedisTemplate<String, String> redisTemplate;
    private final StringRedisTemplate stringRedisTemplate;
    
    // Redis key前缀
    private static final String RPM_KEY = "ratelimit:rpm:";
    private static final String RPD_KEY = "ratelimit:rpd:";
    private static final String TPM_KEY = "ratelimit:tpm:";
    private static final String TPD_KEY = "ratelimit:tpd:";

    /**
     * 检查请求级别限流
     * 先检后扣，原子操作保证准确性
     */
    public RateLimitResult checkRequestLimit(ApiKeyPrincipal principal) {
        String keyId = principal.getKeyId();
        String minuteWindow = getCurrentMinuteWindow();
        String dayWindow = getCurrentDayWindow();
        
        // 检查每分钟请求数
        if (principal.getRpmLimit() != null) {
            String rpmKey = RPM_KEY + keyId + ":" + minuteWindow;
            Long currentRpm = incrementWithExpiry(rpmKey, 60);
            
            if (currentRpm > principal.getRpmLimit()) {
                // 回滚计数
                stringRedisTemplate.opsForValue().decrement(rpmKey);
                return RateLimitResult.rejected(
                    "每分钟请求数超限",
                    principal.getRpmLimit(),
                    60 - LocalDateTime.now().getSecond()
                );
            }
        }
        
        // 检查每天请求数
        if (principal.getRpdLimit() != null) {
            String rpdKey = RPD_KEY + keyId + ":" + dayWindow;
            Long currentRpd = incrementWithExpiry(rpdKey, 86400);
            
            if (currentRpd > principal.getRpdLimit()) {
                stringRedisTemplate.opsForValue().decrement(rpdKey);
                return RateLimitResult.rejected(
                    "今日请求数已达上限",
                    principal.getRpdLimit(),
                    secondsUntilMidnight()
                );
            }
        }
        
        return RateLimitResult.allowed();
    }

    /**
     * 上报Token消耗（在AI响应返回后调用）
     */
    public void reportTokenUsage(String keyId, int promptTokens, int completionTokens) {
        int totalTokens = promptTokens + completionTokens;
        String minuteWindow = getCurrentMinuteWindow();
        String dayWindow = getCurrentDayWindow();
        
        // 更新TPM计数器
        String tpmKey = TPM_KEY + keyId + ":" + minuteWindow;
        incrementBy(tpmKey, totalTokens, 60);
        
        // 更新TPD计数器
        String tpdKey = TPD_KEY + keyId + ":" + dayWindow;
        incrementBy(tpdKey, totalTokens, 86400);
        
        log.debug("Token消耗上报: keyId={}, prompt={}, completion={}, total={}", 
            keyId, promptTokens, completionTokens, totalTokens);
    }

    /**
     * 检查Token级别是否还有余量（在请求前预检）
     */
    public RateLimitResult checkTokenBudget(ApiKeyPrincipal principal) {
        String keyId = principal.getKeyId();
        String minuteWindow = getCurrentMinuteWindow();
        String dayWindow = getCurrentDayWindow();
        
        if (principal.getTpmLimit() != null) {
            String tpmKey = TPM_KEY + keyId + ":" + minuteWindow;
            Long currentTpm = getCurrentCount(tpmKey);
            if (currentTpm >= principal.getTpmLimit()) {
                return RateLimitResult.rejected(
                    "每分钟Token消耗已达上限，请稍后再试",
                    principal.getTpmLimit(),
                    60 - LocalDateTime.now().getSecond()
                );
            }
        }
        
        if (principal.getTpdLimit() != null) {
            String tpdKey = TPD_KEY + keyId + ":" + dayWindow;
            Long currentTpd = getCurrentCount(tpdKey);
            if (currentTpd >= principal.getTpdLimit()) {
                return RateLimitResult.rejected(
                    "今日Token配额已耗尽",
                    principal.getTpdLimit(),
                    secondsUntilMidnight()
                );
            }
        }
        
        return RateLimitResult.allowed();
    }

    private Long incrementWithExpiry(String key, long ttlSeconds) {
        // 使用Lua脚本保证原子性
        String script = """
            local current = redis.call('INCR', KEYS[1])
            if current == 1 then
                redis.call('EXPIRE', KEYS[1], ARGV[1])
            end
            return current
            """;
        
        DefaultRedisScript<Long> redisScript = new DefaultRedisScript<>(script, Long.class);
        return redisTemplate.execute(redisScript, 
            Collections.singletonList(key), 
            String.valueOf(ttlSeconds));
    }
    
    private void incrementBy(String key, int delta, long ttlSeconds) {
        String script = """
            local current = redis.call('INCRBY', KEYS[1], ARGV[1])
            if current == tonumber(ARGV[1]) then
                redis.call('EXPIRE', KEYS[1], ARGV[2])
            end
            return current
            """;
        
        DefaultRedisScript<Long> redisScript = new DefaultRedisScript<>(script, Long.class);
        redisTemplate.execute(redisScript,
            Collections.singletonList(key),
            String.valueOf(delta), String.valueOf(ttlSeconds));
    }
    
    private Long getCurrentCount(String key) {
        String value = stringRedisTemplate.opsForValue().get(key);
        return value != null ? Long.parseLong(value) : 0L;
    }
    
    private String getCurrentMinuteWindow() {
        return LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMddHHmm"));
    }
    
    private String getCurrentDayWindow() {
        return LocalDate.now().format(DateTimeFormatter.ofPattern("yyyyMMdd"));
    }
    
    private long secondsUntilMidnight() {
        LocalDateTime now = LocalDateTime.now();
        LocalDateTime midnight = now.toLocalDate().plusDays(1).atStartOfDay();
        return ChronoUnit.SECONDS.between(now, midnight);
    }
}

第五步：Spring AI 集成与Token监控

@RestController
@RequestMapping("/api/ai")
@RequiredArgsConstructor
@Slf4j
public class AiChatController {

    private final ChatClient chatClient;
    private final RateLimiterService rateLimiterService;
    private final AuditLogService auditLogService;

    @PostMapping("/chat")
    @PreAuthorize("hasAuthority('MODEL:' + #request.model)")  // 方法级权限控制
    public ResponseEntity<ChatResponse> chat(
            @RequestBody @Validated ChatRequest request,
            @AuthenticationPrincipal ApiKeyPrincipal principal) {
        
        // Token预算检查
        RateLimitResult tokenBudget = rateLimiterService.checkTokenBudget(principal);
        if (!tokenBudget.isAllowed()) {
            return ResponseEntity.status(429)
                .header("Retry-After", String.valueOf(tokenBudget.getRetryAfterSeconds()))
                .body(ChatResponse.error(tokenBudget.getMessage()));
        }
        
        long startTime = System.currentTimeMillis();
        
        try {
            // 调用Spring AI
            org.springframework.ai.chat.model.ChatResponse aiResponse = chatClient.prompt()
                .system(buildSystemPrompt(principal))  // 注入系统级约束
                .user(request.getMessage())
                .options(ChatOptionsBuilder.builder()
                    .withModel(request.getModel())
                    .withMaxTokens(Math.min(request.getMaxTokens(), principal.getMaxTokensPerRequest()))
                    .build())
                .call()
                .chatResponse();
            
            // 提取Token消耗统计
            Usage usage = aiResponse.getMetadata().getUsage();
            int promptTokens = (int) usage.getPromptTokens();
            int completionTokens = (int) usage.getGenerationTokens();
            
            // 上报Token消耗
            rateLimiterService.reportTokenUsage(
                principal.getKeyId(), promptTokens, completionTokens);
            
            // 写入审计日志
            auditLogService.recordSuccess(
                principal.getKeyId(),
                request.getModel(),
                promptTokens,
                completionTokens,
                System.currentTimeMillis() - startTime
            );
            
            return ResponseEntity.ok(ChatResponse.success(
                aiResponse.getResult().getOutput().getContent(),
                usage
            ));
            
        } catch (Exception e) {
            auditLogService.recordFailure(principal.getKeyId(), request.getModel(), e.getMessage());
            log.error("AI调用失败: keyId={}, error={}", principal.getKeyId(), e.getMessage());
            throw e;
        }
    }

    @GetMapping("/stream")
    public Flux<ServerSentEvent<String>> streamChat(
            @RequestParam String message,
            @AuthenticationPrincipal ApiKeyPrincipal principal) {
        
        // Token预算检查
        RateLimitResult tokenBudget = rateLimiterService.checkTokenBudget(principal);
        if (!tokenBudget.isAllowed()) {
            return Flux.just(ServerSentEvent.<String>builder()
                .event("error")
                .data("{\"error\":\"" + tokenBudget.getMessage() + "\"}")
                .build());
        }
        
        // 流式响应，token统计在完成时上报
        AtomicInteger totalTokens = new AtomicInteger(0);
        
        return chatClient.prompt()
            .user(message)
            .stream()
            .chatResponse()
            .map(response -> {
                String content = response.getResult().getOutput().getContent();
                // 估算token（流式模式下无法精确获取，用完成后的统计）
                if (content != null) {
                    totalTokens.addAndGet(content.length() / 4); // 粗略估算
                }
                return ServerSentEvent.<String>builder()
                    .event("message")
                    .data(content != null ? content : "")
                    .build();
            })
            .doOnComplete(() -> {
                // 流结束后上报（注意：流式模式下token统计是估算值）
                rateLimiterService.reportTokenUsage(principal.getKeyId(), 0, totalTokens.get());
                log.debug("流式请求完成: keyId={}, estimatedTokens={}", 
                    principal.getKeyId(), totalTokens.get());
            })
            .onErrorResume(e -> {
                log.error("流式请求失败: {}", e.getMessage());
                return Flux.just(ServerSentEvent.<String>builder()
                    .event("error")
                    .data("{\"error\":\"Internal server error\"}")
                    .build());
            });
    }
    
    private String buildSystemPrompt(ApiKeyPrincipal principal) {
        // 根据用户权限注入系统级约束
        return String.format("""
            你是一个企业级AI助手。
            当前用户: %s
            权限级别: %s
            请严格遵守企业数据安全规定，不得泄露任何内部信息。
            """, principal.getOwnerId(), principal.getRole());
    }
}

关键配置项汇总

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o-mini
          max-tokens: 2000

# 限流默认配置
app:
  rate-limit:
    default-rpm: 60
    default-rpd: 1000
    default-tpm: 40000
    default-tpd: 500000
    
  security:
    api-key:
      hash-algorithm: SHA-256  # key存储时使用hash
      min-length: 32
      
# Redis配置
spring:
  data:
    redis:
      host: ${REDIS_HOST:localhost}
      port: 6379
      timeout: 3000ms
      lettuce:
        pool:
          max-active: 20
          max-wait: 1000ms

生产踩坑记录

坑1：Redis宕机时限流服务不可用

我们早期用硬性依赖Redis，结果Redis抖动时整个AI接口全挂了。后来改成降级策略：Redis不可用时，自动切换到本地内存限流器（Caffeine），牺牲部分精度，换取高可用。

坑2：流式响应Token统计不准

Spring AI的流式模式下，每个chunk不携带完整的usage信息。解决方案是在最后一个chunk中取统计值，或者用tiktoken4j在本地估算。

坑3：API Key泄露后的紧急处置

在管理接口里一定要加即时吊销功能，并且把已吊销的key缓存一份在Redis里，避免每次都查DB。

坑4：并发请求的计数器超扣

用INCR+EXPIRE的组合必须用Lua脚本保证原子性，否则在高并发下会出现"检查通过但计数超标"的情况，我在代码里已经给出了正确写法。

完整方案总结

这套方案在我们团队已经跑了半年，防住了多次扫描攻击和误操作。核心思路就一句话：AI接口的防护要从成本角度思考，而不仅仅是安全角度。

每一次未经授权的AI调用，都是在烧你的钱。