AI应用的响应式编程：Spring WebFlux + Project Reactor深度实践

老张2026/8/8大约 18 分钟WebFluxProject Reactor响应式Spring AIJava背压

AI应用的响应式编程：Spring WebFlux + Project Reactor深度实践

一、开篇故事：线程耗尽的灾难

2025年11月的一个周五晚上20:15，某电商平台的AI客服系统突然告警——响应时间从800ms飙升到18秒，接口超时率达到43%。

技术总监小张打开Grafana监控面板，看到了这样的数据：

Tomcat线程池：200/200（全部占用）
等待队列：850个请求
堆内存：14.2GB/16GB（接近OOM）

问题很明显：双十一大促前，并发用户从平时的500人飙到3200人，每个AI对话请求平均耗时3.5秒（AI模型响应慢），200个Tomcat线程全被阻塞在等待AI响应上，新请求进不来。

小张叫来了后端团队的核心开发小陈，问："有办法吗？"

小陈说："有，把阻塞的RestTemplate调用改成响应式的。AI流式输出天然适合响应式，我们把整个调用链改成WebFlux + Reactor，200个线程能处理原来2000个线程才能处理的并发。"

三天后，系统重新压测：

同样3200并发，线程数从200减少到32
P99延迟从18000ms降到2800ms
QPS从140提升到420（提升3倍）
内存占用从14GB降到4.8GB

这就是响应式编程的威力。

二、响应式编程核心概念

2.1 为什么AI流式输出适合响应式

传统阻塞模式：

线程1: ------等待AI响应（3.5秒）-------> 结果 -> 释放
线程2: ------等待AI响应（3.5秒）-------> 结果 -> 释放
...
线程200: 忙碌中...

新请求进来 → 没有线程 → 等待队列 → 超时 → 用户投诉

响应式非阻塞模式：

线程1: 发出请求 -> 挂起 -> 处理其他请求 -> 收到数据块 -> 挂起 -> ...
线程1（同一个）: -> 处理第500个请求的数据 -> 处理第1个请求的最终结果

1个线程，轮询处理所有IO事件，不阻塞

2.2 Mono和Flux的核心语义

// Mono：0或1个异步结果（类比CompletableFuture）
Mono<String> mono = Mono.just("Hello")
    .map(s -> s + " World")         // 转换
    .flatMap(s -> callAiAsync(s))   // 异步操作
    .defaultIfEmpty("默认值")       // 空处理
    .onErrorReturn("出错了");       // 错误处理

// Flux：0到N个异步结果（AI流式输出！）
Flux<String> flux = Flux.just("字", "节", "1")
    .concatWith(Flux.just("字", "节", "2"))  // 合并流
    .filter(s -> !s.isBlank())               // 过滤
    .take(100)                                // 最多取100个
    .onBackpressureBuffer(500);              // 背压缓冲

2.3 Publisher-Subscriber模型

三、完整项目依赖配置

3.1 pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
         http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.3.4</version>
    </parent>

    <groupId>com.laozhang.ai</groupId>
    <artifactId>spring-ai-webflux-demo</artifactId>
    <version>1.0.0</version>

    <properties>
        <java.version>21</java.version>
        <spring-ai.version>1.0.0-M6</spring-ai.version>
    </properties>

    <dependencies>
        <!-- WebFlux（响应式Web，替代spring-boot-starter-web） -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-webflux</artifactId>
        </dependency>

        <!-- Spring AI -->
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
        </dependency>

        <!-- R2DBC（响应式数据库，替代JPA） -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-r2dbc</artifactId>
        </dependency>
        <dependency>
            <groupId>io.asyncer</groupId>
            <artifactId>r2dbc-mysql</artifactId>
            <version>1.1.3</version>
        </dependency>

        <!-- 响应式Redis -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis-reactive</artifactId>
        </dependency>

        <!-- Reactor Extra（更丰富的操作符） -->
        <dependency>
            <groupId>io.projectreactor.addons</groupId>
            <artifactId>reactor-extra</artifactId>
        </dependency>

        <!-- 背压测试工具 -->
        <dependency>
            <groupId>io.projectreactor</groupId>
            <artifactId>reactor-test</artifactId>
            <scope>test</scope>
        </dependency>

        <!-- Micrometer + Prometheus（响应式指标） -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>
        <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
        </dependency>

        <!-- Lombok -->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>

        <!-- Test -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.ai</groupId>
                <artifactId>spring-ai-bom</artifactId>
                <version>${spring-ai.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <repositories>
        <repository>
            <id>spring-milestones</id>
            <url>https://repo.spring.io/milestone</url>
        </repository>
    </repositories>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

3.2 application.yml

server:
  port: 8080
  # WebFlux不需要Tomcat配置，Netty天然异步非阻塞

spring:
  application:
    name: spring-ai-webflux-demo

  # Spring AI配置
  ai:
    openai:
      api-key: ${DASHSCOPE_API_KEY}
      base-url: https://dashscope.aliyuncs.com/compatible-mode/v1
      chat:
        options:
          model: qwen-plus
          temperature: 0.7

  # R2DBC响应式数据库配置
  r2dbc:
    url: r2dbc:mysql://localhost:3306/ai_reactive
    username: ${DB_USER:root}
    password: ${DB_PASSWORD:password}
    pool:
      initial-size: 5
      max-size: 20
      max-idle-time: 30m
      validation-query: SELECT 1

  # 响应式Redis
  data:
    redis:
      host: localhost
      port: 6379
      lettuce:
        pool:
          max-active: 50
          max-idle: 20

# Reactor调度器配置
reactor:
  schedulers:
    default-bounded-elastic-size: 100  # elastic线程池上限

# 响应式背压配置
ai:
  stream:
    buffer-size: 256           # 背压缓冲区大小
    timeout-seconds: 60        # 流式超时
    max-concurrent-streams: 500  # 最大并发流数

# 性能监控
management:
  endpoints:
    web:
      exposure:
        include: health,metrics,prometheus
  metrics:
    export:
      prometheus:
        enabled: true

四、Spring AI的响应式支持

4.1 `ChatClient返回Flux<String>`

package com.laozhang.ai.service;

import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.stereotype.Service;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

import java.time.Duration;

/**
 * 响应式AI服务
 * 所有方法返回Mono/Flux，不阻塞调用线程
 */
@Service
@Slf4j
@RequiredArgsConstructor
public class ReactiveAiService {

    private final ChatClient chatClient;

    /**
     * 流式AI对话
     * 返回Flux<String>，每个元素是AI生成的一个文本片段
     */
    public Flux<String> streamChat(String userId, String message) {
        log.debug("[REACTIVE-AI] Stream chat: userId={}, messageLen={}",
                userId, message.length());

        return chatClient.prompt()
                .user(message)
                .stream()
                .content()  // 返回 Flux<String>
                .doOnSubscribe(s -> log.debug("[REACTIVE-AI] Stream started: userId={}", userId))
                .doOnComplete(() -> log.debug("[REACTIVE-AI] Stream completed: userId={}", userId))
                .doOnError(e -> log.error("[REACTIVE-AI] Stream error: userId={}, error={}",
                        userId, e.getMessage()))
                // 超时控制
                .timeout(Duration.ofSeconds(60))
                // 错误降级
                .onErrorResume(e -> {
                    log.warn("[REACTIVE-AI] Fallback response for userId={}", userId);
                    return Flux.just("AI服务暂时不可用，请稍后重试。");
                });
    }

    /**
     * 非流式AI对话（返回完整响应）
     */
    public Mono<String> chat(String userId, String message) {
        return Mono.fromCallable(() ->
                chatClient.prompt()
                        .user(message)
                        .call()
                        .content()
        )
        // 在有界弹性线程池中执行（阻塞操作）
        .subscribeOn(reactor.core.scheduler.Schedulers.boundedElastic())
        .timeout(Duration.ofSeconds(30))
        .onErrorReturn("AI服务暂时不可用，请稍后重试。");
    }

    /**
     * 带系统提示词的流式对话
     */
    public Flux<String> streamChatWithSystemPrompt(String userId, String systemPrompt,
                                                    String message) {
        return chatClient.prompt()
                .system(systemPrompt)
                .user(message)
                .stream()
                .content()
                .timeout(Duration.ofSeconds(60))
                .onErrorResume(e -> Flux.just("[错误] " + e.getMessage()));
    }
}

五、SSE流式输出的响应式实现

5.1 SSE Controller

package com.laozhang.ai.controller;

import com.laozhang.ai.service.ReactiveAiService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.MediaType;
import org.springframework.http.codec.ServerSentEvent;
import org.springframework.web.bind.annotation.*;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

import java.time.Duration;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * 响应式AI对话Controller
 *
 * 关键：方法返回Flux/Mono，WebFlux框架自动处理异步/背压
 */
@RestController
@RequestMapping("/api/v1/ai")
@Slf4j
@RequiredArgsConstructor
public class ReactiveAiController {

    private final ReactiveAiService aiService;

    /**
     * SSE流式对话接口
     * 客户端使用EventSource接收实时AI响应
     */
    @GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<ServerSentEvent<String>> streamChat(
            @RequestParam String message,
            @RequestParam(defaultValue = "anonymous") String userId) {

        log.info("[SSE] Stream request: userId={}", userId);

        AtomicInteger sequenceNo = new AtomicInteger(0);

        return aiService.streamChat(userId, message)
                // 将每个文本片段包装成SSE事件
                .map(chunk -> ServerSentEvent.<String>builder()
                        .id(String.valueOf(sequenceNo.incrementAndGet()))
                        .event("chunk")
                        .data(chunk)
                        .build()
                )
                // 在流结束时发送完成事件
                .concatWith(Mono.just(
                        ServerSentEvent.<String>builder()
                                .event("done")
                                .data("[DONE]")
                                .build()
                ))
                // 错误时发送错误事件
                .onErrorResume(e -> Flux.just(
                        ServerSentEvent.<String>builder()
                                .event("error")
                                .data(e.getMessage())
                                .build()
                ));
    }

    /**
     * 纯文本流式接口（更简单，无SSE包装）
     */
    @GetMapping(value = "/stream/text", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> streamText(
            @RequestParam String message,
            @RequestParam(defaultValue = "anonymous") String userId) {
        return aiService.streamChat(userId, message);
    }

    /**
     * 非流式对话接口
     */
    @PostMapping("/chat")
    public Mono<ChatResponse> chat(@RequestBody ChatRequest request,
                                    @RequestHeader("X-User-Id") String userId) {
        return aiService.chat(userId, request.getMessage())
                .map(content -> new ChatResponse(content, userId));
    }

    // DTOs
    public record ChatRequest(String message) {}
    public record ChatResponse(String content, String userId) {}
}

5.2 前端JavaScript对接SSE

<!-- 前端JavaScript（供读者参考） -->
<script>
// 使用EventSource接收SSE流式响应
function startStreamChat(message, userId) {
    const eventSource = new EventSource(
        `/api/v1/ai/stream?message=${encodeURIComponent(message)}&userId=${userId}`
    );
    
    let fullResponse = '';
    
    // 接收AI生成的文本片段
    eventSource.addEventListener('chunk', (event) => {
        fullResponse += event.data;
        document.getElementById('response').textContent = fullResponse;
    });
    
    // 流结束
    eventSource.addEventListener('done', (event) => {
        eventSource.close();
        console.log('Stream completed');
    });
    
    // 错误处理
    eventSource.addEventListener('error', (event) => {
        console.error('SSE error:', event.data);
        eventSource.close();
    });
}
</script>

六、背压控制：当消费者来不及处理时的策略

6.1 背压问题的本质

AI模型每秒产生50个文本片段（Producer速度）
网络传输每秒只能发送20个片段（Consumer速度）
→ 生产者比消费者快，积压！

如果没有背压控制：
→ 内存溢出（OOM）
→ 无限积压队列

有了背压控制：
→ 生产者减速或采用丢弃/错误策略

6.2 四种背压策略

package com.laozhang.ai.service;

import lombok.extern.slf4j.Slf4j;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;
import reactor.core.scheduler.Schedulers;

import java.time.Duration;

@Slf4j
public class BackpressureDemo {

    /**
     * 策略1：缓冲（Buffer）
     * 适用：消费者偶尔慢，但整体能跟上
     */
    public Flux<String> withBufferBackpressure(Flux<String> source) {
        return source
            // 缓冲最多500个元素，超过则报错
            .onBackpressureBuffer(500,
                dropped -> log.warn("[BACKPRESSURE] Buffer overflow, dropped: {}", dropped))
            .publishOn(Schedulers.boundedElastic());
    }

    /**
     * 策略2：丢弃（Drop）
     * 适用：允许丢失部分AI生成的内容（不适合重要场景）
     */
    public Flux<String> withDropBackpressure(Flux<String> source) {
        return source
            .onBackpressureDrop(dropped ->
                log.debug("[BACKPRESSURE] Dropped: {}", dropped))
            .publishOn(Schedulers.boundedElastic());
    }

    /**
     * 策略3：报错（Error）
     * 适用：要求严格可靠性，背压即失败
     */
    public Flux<String> withErrorBackpressure(Flux<String> source) {
        return source
            .onBackpressureError()
            .onErrorResume(ex -> {
                log.error("[BACKPRESSURE] Overflow error", ex);
                return Flux.just("[响应过快，请求终止]");
            });
    }

    /**
     * 策略4：最新值（Latest）
     * 适用：实时显示场景，只要最新片段（可以接受跳过中间内容）
     */
    public Flux<String> withLatestBackpressure(Flux<String> source) {
        return source
            .onBackpressureLatest()
            .publishOn(Schedulers.boundedElastic());
    }

    /**
     * AI流式输出推荐配置：缓冲 + 限速
     */
    public Flux<String> productionAiStream(Flux<String> aiOutput) {
        return aiOutput
            // 缓冲区大小
            .onBackpressureBuffer(256)
            // 在弹性线程池消费（不阻塞Netty IO线程）
            .publishOn(Schedulers.boundedElastic())
            // 限制每个SSE连接的发送速率（防止前端溢出）
            .delayElements(Duration.ofMillis(10))  // 每10ms发一个chunk
            // 超时处理
            .timeout(Duration.ofSeconds(60));
    }
}

七、响应式数据库：R2DBC替代传统JDBC

7.1 对话历史R2DBC实体

package com.laozhang.ai.domain;

import lombok.*;
import org.springframework.data.annotation.Id;
import org.springframework.data.relational.core.mapping.Column;
import org.springframework.data.relational.core.mapping.Table;

import java.time.LocalDateTime;

/**
 * 对话记录（R2DBC实体）
 * 注意：使用Spring Data R2DBC的注解，不是JPA注解
 */
@Table("ai_conversation_log")
@Data
@NoArgsConstructor
@AllArgsConstructor
@Builder
public class ConversationLog {

    @Id
    private Long id;

    @Column("conversation_id")
    private String conversationId;

    @Column("user_id")
    private String userId;

    @Column("user_message")
    private String userMessage;

    @Column("ai_response")
    private String aiResponse;

    @Column("input_tokens")
    private int inputTokens;

    @Column("output_tokens")
    private int outputTokens;

    @Column("elapsed_ms")
    private long elapsedMs;

    @Column("created_at")
    private LocalDateTime createdAt;
}

7.2 响应式Repository

package com.laozhang.ai.repository;

import com.laozhang.ai.domain.ConversationLog;
import org.springframework.data.r2dbc.repository.Query;
import org.springframework.data.repository.reactive.ReactiveCrudRepository;
import org.springframework.stereotype.Repository;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

import java.time.LocalDate;

/**
 * 响应式对话记录Repository
 * 注意：所有方法返回Mono或Flux，不返回普通Java对象
 */
@Repository
public interface ConversationLogRepository
        extends ReactiveCrudRepository<ConversationLog, Long> {

    /**
     * 查询用户的对话历史（响应式）
     */
    Flux<ConversationLog> findByUserIdOrderByCreatedAtDesc(String userId);

    /**
     * 查询某个对话的所有消息
     */
    Flux<ConversationLog> findByConversationIdOrderByCreatedAt(String conversationId);

    /**
     * 统计用户今日使用Token数（响应式）
     */
    @Query("SELECT COALESCE(SUM(input_tokens + output_tokens), 0) " +
           "FROM ai_conversation_log " +
           "WHERE user_id = :userId AND DATE(created_at) = :date")
    Mono<Long> sumTodayTokensByUserId(String userId, LocalDate date);

    /**
     * 分页查询最近对话
     */
    @Query("SELECT * FROM ai_conversation_log WHERE user_id = :userId " +
           "ORDER BY created_at DESC LIMIT :limit OFFSET :offset")
    Flux<ConversationLog> findByUserIdPaged(String userId, int limit, int offset);
}

7.3 响应式服务（R2DBC + AI集成）

package com.laozhang.ai.service;

import com.laozhang.ai.domain.ConversationLog;
import com.laozhang.ai.repository.ConversationLogRepository;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.stereotype.Service;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

import java.time.LocalDate;
import java.time.LocalDateTime;
import java.util.UUID;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * 完整的响应式AI对话服务
 * 从AI调用到数据库存储，全链路响应式
 */
@Service
@Slf4j
@RequiredArgsConstructor
public class FullReactiveAiService {

    private final ChatClient chatClient;
    private final ConversationLogRepository logRepository;
    private final ReactiveRedisService redisService;

    /**
     * 流式AI对话 + 异步持久化
     * 全链路响应式：AI流式 → SSE输出 → 异步保存到MySQL
     */
    public Flux<String> streamChatAndSave(String userId, String message) {
        String conversationId = UUID.randomUUID().toString();
        long startTime = System.currentTimeMillis();

        // 用于收集完整响应（用于最后保存）
        StringBuilder fullResponse = new StringBuilder();
        AtomicInteger chunkCount = new AtomicInteger(0);

        return chatClient.prompt()
                .user(message)
                .stream()
                .content()
                .doOnNext(chunk -> {
                    fullResponse.append(chunk);
                    chunkCount.incrementAndGet();
                })
                .doOnComplete(() -> {
                    // 流结束后异步保存到数据库（不影响用户的流式体验）
                    long elapsed = System.currentTimeMillis() - startTime;
                    ConversationLog log = ConversationLog.builder()
                            .conversationId(conversationId)
                            .userId(userId)
                            .userMessage(message)
                            .aiResponse(fullResponse.toString())
                            .outputTokens(chunkCount.get() * 3) // 粗略估算
                            .elapsedMs(elapsed)
                            .createdAt(LocalDateTime.now())
                            .build();

                    // 异步保存，不等待结果
                    logRepository.save(log)
                            .subscribe(
                                saved -> FullReactiveAiService.log.debug(
                                    "[REACTIVE] Saved conversation: id={}", saved.getId()),
                                error -> FullReactiveAiService.log.error(
                                    "[REACTIVE] Failed to save: {}", error.getMessage())
                            );
                })
                .timeout(java.time.Duration.ofSeconds(60))
                .onBackpressureBuffer(256)
                .onErrorResume(e -> {
                    log.error("[REACTIVE] Stream error: {}", e.getMessage());
                    return Flux.just("AI服务暂时不可用，请稍后重试。");
                });
    }

    /**
     * 查询用户今日使用统计（响应式）
     */
    public Mono<UserDailyStats> getUserTodayStats(String userId) {
        return Mono.zip(
            // 并行查询：今日Token消耗 + 今日对话次数
            logRepository.sumTodayTokensByUserId(userId, LocalDate.now()),
            logRepository.findByUserIdOrderByCreatedAtDesc(userId).count()
        )
        .map(tuple -> new UserDailyStats(
            userId,
            tuple.getT1(),      // 今日Token
            tuple.getT2()       // 历史总对话数
        ));
    }

    public record UserDailyStats(String userId, long todayTokens, long totalConversations) {}
}

八、响应式Redis：Spring Data Redis Reactive

package com.laozhang.ai.service;

import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.data.redis.core.ReactiveStringRedisTemplate;
import org.springframework.stereotype.Service;
import reactor.core.publisher.Mono;

import java.time.Duration;

/**
 * 响应式Redis服务
 * 用于AI应用的缓存、限流、会话管理
 */
@Service
@Slf4j
@RequiredArgsConstructor
public class ReactiveRedisService {

    private final ReactiveStringRedisTemplate redisTemplate;

    /**
     * 响应式缓存获取
     */
    public Mono<String> getCache(String key) {
        return redisTemplate.opsForValue().get(key)
                .doOnNext(v -> log.debug("[REDIS] Cache HIT: key={}", key))
                .doOnEmpty(() -> log.debug("[REDIS] Cache MISS: key={}", key));
    }

    /**
     * 响应式缓存写入
     */
    public Mono<Boolean> setCache(String key, String value, Duration ttl) {
        return redisTemplate.opsForValue().set(key, value, ttl)
                .doOnSuccess(result -> log.debug(
                    "[REDIS] Cache SET: key={}, ttl={}", key, ttl));
    }

    /**
     * 响应式限流（基于Redis计数器）
     */
    public Mono<Boolean> checkRateLimit(String userId, int maxPerSecond) {
        String key = "ai:ratelimit:" + userId + ":" +
                     System.currentTimeMillis() / 1000;

        return redisTemplate.opsForValue()
                .increment(key)
                .flatMap(count -> {
                    if (count == 1) {
                        // 第一次，设置1秒过期
                        return redisTemplate.expire(key, Duration.ofSeconds(2))
                                .thenReturn(count);
                    }
                    return Mono.just(count);
                })
                .map(count -> count <= maxPerSecond);
    }

    /**
     * 缓存穿透保护：先查缓存，缓存miss时调用supplier并缓存结果
     */
    public Mono<String> getOrLoad(String cacheKey, Duration ttl,
                                   java.util.function.Supplier<Mono<String>> loader) {
        return getCache(cacheKey)
                .switchIfEmpty(
                    loader.get()
                          .flatMap(value ->
                              setCache(cacheKey, value, ttl)
                                  .thenReturn(value)
                          )
                );
    }
}

九、完整响应式AI服务：整合所有组件

package com.laozhang.ai.service;

import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.stereotype.Service;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

import java.time.Duration;

/**
 * 生产级响应式AI服务
 * 集成：限流 + 缓存 + 流式输出 + 异步持久化
 */
@Service
@Slf4j
@RequiredArgsConstructor
public class ProductionReactiveAiService {

    private final ChatClient chatClient;
    private final ReactiveRedisService redisService;
    private final ConversationLogRepository logRepository;

    private static final int RATE_LIMIT_PER_SECOND = 5;
    private static final Duration CACHE_TTL = Duration.ofHours(1);

    /**
     * 完整的响应式处理链：
     * 限流检查 → 缓存查询 → AI调用 → 缓存写入 → 异步持久化
     */
    public Flux<String> process(String userId, String message) {
        return checkRateLimit(userId)
                // 限流通过后，检查缓存
                .flatMapMany(allowed -> {
                    String cacheKey = buildCacheKey(message);
                    return checkCacheAndStream(userId, message, cacheKey);
                });
    }

    private Mono<Boolean> checkRateLimit(String userId) {
        return redisService.checkRateLimit(userId, RATE_LIMIT_PER_SECOND)
                .flatMap(allowed -> {
                    if (!allowed) {
                        return Mono.error(new TooManyRequestsException(
                            "请求过于频繁，请1秒后重试"));
                    }
                    return Mono.just(true);
                });
    }

    private Flux<String> checkCacheAndStream(String userId, String message, String cacheKey) {
        return redisService.getCache(cacheKey)
                .flatMapMany(cached -> {
                    // 缓存命中：直接返回缓存内容（模拟流式）
                    log.debug("[REACTIVE] Cache hit for userId={}", userId);
                    return Flux.just(cached);
                })
                .switchIfEmpty(
                    // 缓存未命中：调用AI并流式返回
                    callAiAndCache(userId, message, cacheKey)
                );
    }

    private Flux<String> callAiAndCache(String userId, String message, String cacheKey) {
        StringBuilder fullContent = new StringBuilder();

        return chatClient.prompt()
                .user(message)
                .stream()
                .content()
                .doOnNext(fullContent::append)
                .doOnComplete(() -> {
                    String content = fullContent.toString();
                    // 异步写入缓存（不影响流式输出）
                    redisService.setCache(cacheKey, content, CACHE_TTL)
                            .subscribe(
                                r -> log.debug("[REACTIVE] Cache written: key={}", cacheKey),
                                e -> log.warn("[REACTIVE] Cache write failed: {}", e.getMessage())
                            );
                })
                .onBackpressureBuffer(256)
                .timeout(Duration.ofSeconds(60));
    }

    private String buildCacheKey(String message) {
        try {
            java.security.MessageDigest md = java.security.MessageDigest.getInstance("MD5");
            byte[] hash = md.digest(message.getBytes());
            StringBuilder sb = new StringBuilder("ai:cache:");
            for (byte b : hash) sb.append(String.format("%02x", b));
            return sb.toString();
        } catch (Exception e) {
            return "ai:cache:" + message.hashCode();
        }
    }

    public static class TooManyRequestsException extends RuntimeException {
        public TooManyRequestsException(String message) { super(message); }
    }
}

十、错误处理：响应式流中的异常处理模式

package com.laozhang.ai.service;

import lombok.extern.slf4j.Slf4j;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;
import reactor.util.retry.Retry;

import java.time.Duration;

/**
 * 响应式错误处理模式示例
 */
@Slf4j
public class ReactiveErrorHandlingDemo {

    /**
     * 模式1：onErrorReturn - 出错时返回默认值
     */
    public Mono<String> withDefaultValue(Mono<String> source) {
        return source
            .onErrorReturn("AI服务暂时不可用，请稍后重试。");
    }

    /**
     * 模式2：onErrorResume - 出错时降级到备用逻辑
     */
    public Flux<String> withFallback(Flux<String> source) {
        return source
            .onErrorResume(TooManyRequestsException.class, e -> {
                log.warn("[ERROR] Rate limited, returning static response");
                return Flux.just("请求频繁，请1秒后重试。");
            })
            .onErrorResume(java.util.concurrent.TimeoutException.class, e -> {
                log.warn("[ERROR] AI timeout, returning timeout message");
                return Flux.just("AI响应超时，请稍后重试。");
            })
            .onErrorResume(Exception.class, e -> {
                log.error("[ERROR] Unexpected error: {}", e.getMessage(), e);
                return Flux.just("系统错误，请联系客服。");
            });
    }

    /**
     * 模式3：retry - 自动重试（指数退避）
     */
    public Mono<String> withRetry(Mono<String> source) {
        return source
            .retryWhen(
                Retry.backoff(3, Duration.ofMillis(500))
                     .maxBackoff(Duration.ofSeconds(5))
                     .filter(e -> !(e instanceof IllegalArgumentException))  // 不重试参数错误
                     .doAfterRetry(signal -> log.warn(
                         "[RETRY] Retry #{}: {}", signal.totalRetriesInARow(),
                         signal.failure().getMessage()
                     ))
            )
            .onErrorReturn("重试3次后仍失败，请稍后重试。");
    }

    /**
     * 模式4：timeout + retry 组合
     * 生产环境推荐配置
     */
    public Flux<String> productionPattern(Flux<String> aiStream) {
        return aiStream
            // 整体超时
            .timeout(Duration.ofSeconds(60))
            // 背压缓冲
            .onBackpressureBuffer(256)
            // 分层错误处理
            .onErrorResume(java.util.concurrent.TimeoutException.class,
                e -> Flux.just("响应超时，请重新提问。"))
            .onErrorResume(IllegalStateException.class,
                e -> {
                    log.error("[ERROR] State error: {}", e.getMessage());
                    return Flux.just("系统异常，请稍后重试。");
                })
            .onErrorResume(e -> {
                log.error("[ERROR] Unhandled error: {}", e.getMessage(), e);
                return Flux.just("服务暂时不可用。");
            });
    }

    /**
     * 模式5：doOnError - 记录错误但不改变流行为
     */
    public Flux<String> withErrorLogging(Flux<String> source) {
        return source
            .doOnError(e -> log.error("[MONITOR] AI stream error: {}", e.getMessage(), e))
            // 记录完了再处理
            .onErrorResume(e -> Flux.just("error: " + e.getMessage()));
    }

    public static class TooManyRequestsException extends RuntimeException {
        public TooManyRequestsException(String msg) { super(msg); }
    }
}

十一、响应式测试：StepVerifier测试Flux

package com.laozhang.ai.service;

import org.junit.jupiter.api.*;
import org.mockito.*;
import org.springframework.ai.chat.client.ChatClient;
import reactor.core.publisher.Flux;
import reactor.test.StepVerifier;

import java.time.Duration;
import java.util.List;

import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.*;

/**
 * 响应式AI服务测试
 * 核心工具：StepVerifier
 */
class ReactiveAiServiceTest {

    @Mock
    private ChatClient chatClient;

    @Mock
    private ReactiveRedisService redisService;

    @BeforeEach
    void setUp() {
        MockitoAnnotations.openMocks(this);
    }

    @Test
    @DisplayName("正常流式输出应该逐个发出文本片段")
    void shouldEmitChunksInOrder() {
        // 准备Mock数据
        Flux<String> mockStream = Flux.just("你好", "，", "我是AI", "助手");

        // 使用StepVerifier验证流
        StepVerifier.create(mockStream)
                .expectNext("你好")         // 验证第1个元素
                .expectNext("，")           // 验证第2个元素
                .expectNext("我是AI")       // 验证第3个元素
                .expectNext("助手")         // 验证第4个元素
                .expectComplete()           // 验证流正常结束
                .verify(Duration.ofSeconds(5));  // 最多等5秒
    }

    @Test
    @DisplayName("AI超时时应该降级返回错误提示")
    void shouldFallbackOnTimeout() {
        // 模拟超时
        Flux<String> timeoutStream = Flux.<String>never()
            .timeout(Duration.ofMillis(100))
            .onErrorResume(e -> Flux.just("AI响应超时，请稍后重试。"));

        StepVerifier.create(timeoutStream)
                .expectNext("AI响应超时，请稍后重试。")
                .expectComplete()
                .verify(Duration.ofSeconds(5));
    }

    @Test
    @DisplayName("背压缓冲区溢出时应该处理溢出")
    void shouldHandleBackpressureOverflow() {
        // 创建1000个元素的快速流
        Flux<String> fastProducer = Flux.range(1, 1000)
                .map(i -> "chunk" + i);

        // 加入背压缓冲（仅缓冲100个）
        Flux<String> withBackpressure = fastProducer
                .onBackpressureBuffer(100);

        // 验证流能正常处理（不会OOM）
        StepVerifier.create(withBackpressure)
                .expectNextCount(100)  // 至少处理100个
                .thenCancel()          // 手动取消（不等全部完成）
                .verify(Duration.ofSeconds(5));
    }

    @Test
    @DisplayName("并发多个AI请求应该正确处理")
    void shouldHandleConcurrentRequests() {
        // 模拟3个并发的AI请求
        Flux<String> req1 = Flux.just("回答1-片段1", "回答1-片段2");
        Flux<String> req2 = Flux.just("回答2-片段1", "回答2-片段2");
        Flux<String> req3 = Flux.just("回答3-片段1");

        // 并发合并（merge）
        Flux<String> merged = Flux.merge(req1, req2, req3);

        StepVerifier.create(merged)
                .expectNextCount(5)  // 总共5个片段
                .expectComplete()
                .verify(Duration.ofSeconds(5));
    }

    @Test
    @DisplayName("使用VirtualTime测试时间相关操作")
    void shouldTestWithVirtualTime() {
        // 使用虚拟时间，不需要真实等待
        StepVerifier.withVirtualTime(() ->
                Flux.just("a", "b", "c")
                    .delayElements(Duration.ofSeconds(10))  // 真实需要30秒
        )
        .expectSubscription()
        .thenAwait(Duration.ofSeconds(30))  // 虚拟等待30秒（即时完成）
        .expectNext("a", "b", "c")
        .expectComplete()
        .verify();  // 实际执行时间<100ms
    }
}

十二、性能对比：WebFlux vs Servlet在AI场景的实测数据

12.1 测试环境

服务器：阿里云ECS 8核16GB
AI模型：通义千问Plus（平均响应延迟3.5秒）
测试工具：wrk + 自定义JMeter脚本
测试场景：AI流式对话接口（SSE）

12.2 详细压测数据

场景A：100并发，持续60秒

指标	Spring MVC (Servlet)	Spring WebFlux (Reactor)	提升
成功请求数	1,680	5,040	+200%
QPS	28	84	+200%
P50延迟	3,520ms	3,510ms	持平
P99延迟	8,200ms	3,950ms	-52%
超时率	12.3%	0.8%	-93%
线程数（峰值）	200	24	-88%
堆内存（稳定）	4.2GB	1.1GB	-74%

场景B：1000并发，持续30秒（极端场景）

指标	Spring MVC	Spring WebFlux	提升
成功率	34%	96%	+182%
P99延迟	服务崩溃	8,200ms	可用
线程数	OOM崩溃	36	稳定
GC暂停时间	>2s/次	<50ms/次	-97%

12.3 关键结论

I/O密集型AI应用，WebFlux是最佳选择：AI调用是典型的I/O密集型操作，等待时间长，WebFlux能让少量线程服务大量并发。
P50延迟持平：响应式不能让单个请求更快（AI本身的延迟不变），但在高并发下P99大幅改善。
内存优势显著：MVC每个请求占用一个线程（~1MB栈），1000并发需要1GB线程栈；WebFlux只需少量线程，内存节省74%。
什么时候不用WebFlux：CPU密集型任务（AI数据预处理）、大量同步代码需要重构、团队对响应式编程不熟悉时，宁可用MVC加大线程池。

十三、响应式架构全景图

十四、FAQ

Q1：Spring WebFlux和Spring MVC能同时存在于一个项目中吗？

A：不能。Spring Boot项目只能选一个Web框架。如果classpath同时有spring-webmvc和spring-webflux，Spring Boot会优先选择MVC。迁移建议：新项目用WebFlux；老项目可以考虑模块化，新模块用WebFlux独立部署。

Q2：Mono.fromCallable和Mono.just有什么区别？

A：Mono.just(value)在创建时就执行，立刻求值；Mono.fromCallable(() -> compute())在订阅时才执行，实现懒加载。对于可能抛异常或耗时的操作，必须用fromCallable。阻塞调用必须配合.subscribeOn(Schedulers.boundedElastic())在专属线程池执行，不能阻塞Netty的IO线程。

Q3：响应式编程的最大坑是什么？

A：阻塞调用混入响应式链。如果你在Flux的.map()中调用了阻塞的JDBC操作，Netty的IO线程就会被阻塞，性能反而比MVC更差。检测工具：引入reactor.blockhound:blockhound，它会在测试中检测所有阻塞调用。

Q4：如何在WebFlux中获取请求上下文（如用户信息）？

A：不能用ThreadLocal（响应式不绑定线程），要用Reactor Context：

Mono<String> result = ReactiveSecurityContextHolder
    .getContext()
    .map(ctx -> ctx.getAuthentication().getName())
    .flatMap(userId -> aiService.chat(userId, message));

Q5：StepVerifier测试时间相关操作很慢，怎么加速？

A：使用StepVerifier.withVirtualTime()，它让Reactor的时钟可以虚拟推进，thenAwait(Duration.ofHours(1))可以瞬间完成，整个测试<100ms执行完毕。

Q6：R2DBC和JPA哪个更好用？

A：JPA生产上更成熟，文档更丰富，ORM功能更完整。R2DBC适合纯响应式架构，不能有任何阻塞调用。如果你已经有大量JPA代码，迁移成本很高，可以将JPA操作包装在Mono.fromCallable()中并切换到boundedElastic线程池，作为过渡方案。

结尾

小张的团队花了3天完成了响应式重构。核心变化是：

Tomcat → Netty（自动切换，只改依赖）
阻塞的RestTemplate AI调用 → ChatClient.stream().content()返回Flux<String>
阻塞JDBC → R2DBC响应式数据库访问
同步Redis → ReactiveRedisTemplate

结果：同样200并发，QPS从140提升到420，线程从200降到24，内存节省74%。双十一大促，系统稳如磐石。

响应式编程不是银弹，但对AI流式输出这种I/O密集场景，它是最合适的工具。