Spring AI 的 ChatMemory 深度剖析——不只是保存对话历史

老张2026/4/30大约 9 分钟

Spring AI 的 ChatMemory 深度剖析——不只是保存对话历史

我们系统有一次出了一个奇怪的 bug：用户反馈说，明明在 A 服务器上聊了一段话，切换页面再回来就「失忆」了。

我查了一下日志，发现用户的两次请求分别打到了两个不同的服务实例上。

原来，我们用的是 InMemoryChatMemory，对话历史存在 JVM 内存里。多实例部署，每台机器各自维护一份，当然不互通。

这件事之后，我彻底研究了一遍 Spring AI 的 ChatMemory 体系。这篇文章就把我的研究成果分享出来，包括几个不太为人所知的细节。

ChatMemory 的本质

很多人以为 ChatMemory 就是「保存历史消息的容器」，这个理解有点片面。

更准确的描述是：ChatMemory 是多轮对话状态的管理机制。它不只是存储历史消息，还负责：

按会话隔离：不同 conversationId 的对话历史互不干扰
历史截断：防止历史过长导致 token 超限
与 Advisor 集成：在请求发送前把历史注入 prompt，在响应返回后把新轮次追加进历史

来看核心接口：

public interface ChatMemory {
    
    // 获取会话历史
    List<Message> get(String conversationId, int lastN);
    
    // 追加消息到会话
    void add(String conversationId, List<Message> messages);
    
    // 清空会话历史
    void clear(String conversationId);
    
    // 用于 Advisor 参数传递的 key
    String CONVERSATION_ID = "chat_memory_conversation_id";
}

接口很简单，但实现的差异很大。

InMemoryChatMemory 的内部实现和问题

先看一下 Spring AI 内置的内存实现：

// Spring AI 源码简化版
public class InMemoryChatMemory implements ChatMemory {

    // 用 ConcurrentHashMap 按 conversationId 存储消息列表
    private final Map<String, List<Message>> conversationHistory = new ConcurrentHashMap<>();

    @Override
    public void add(String conversationId, List<Message> messages) {
        this.conversationHistory.merge(conversationId, new ArrayList<>(messages), 
            (existing, newMsgs) -> {
                List<Message> combined = new ArrayList<>(existing);
                combined.addAll(newMsgs);
                return combined;
            });
    }

    @Override
    public List<Message> get(String conversationId, int lastN) {
        List<Message> all = this.conversationHistory.getOrDefault(conversationId, List.of());
        // 只返回最后 N 条
        return all.size() <= lastN ? all : all.subList(all.size() - lastN, all.size());
    }

    @Override
    public void clear(String conversationId) {
        this.conversationHistory.remove(conversationId);
    }
}

问题很清楚了：

问题一：多实例数据不一致（最严重）

负载均衡把同一个用户的请求打到不同实例，每个实例有自己的 conversationHistory，互不可见。

问题二：服务重启历史丢失

JVM 内存里的数据，重启就没了。用户明明昨天聊到一半，今天回来就空白了。

问题三：内存无限增长

如果系统有很多用户，每个用户的对话历史都存在内存里，随着时间推移内存会不断增长。InMemoryChatMemory 没有任何过期清理机制。

什么时候用 InMemoryChatMemory 是合理的？

单实例部署的演示/开发环境
对话历史不重要、每次启动允许清空的场景
用于测试

生产环境几乎都需要持久化的 ChatMemory。

多实例部署下的对话状态问题

这就是问题所在：两个请求打到了不同的实例，各自的 InMemory 里内容不同。

解法有两个：

粘性会话（Sticky Session）：负载均衡层把同一个用户的请求都路由到同一个实例。简单但有单点风险，某实例挂了用户历史就丢了。
分布式 ChatMemory：历史数据存在外部共享存储（Redis、数据库），所有实例共用。这才是真正的解法。

基于 Redis 的分布式 ChatMemory 实现

@Component
public class RedisChatMemory implements ChatMemory {

    private static final Logger log = LoggerFactory.getLogger(RedisChatMemory.class);
    
    private static final String KEY_PREFIX = "chat:history:";
    private static final Duration DEFAULT_TTL = Duration.ofDays(7);  // 对话历史保留 7 天
    
    private final RedisTemplate<String, String> redisTemplate;
    private final ObjectMapper objectMapper;
    
    public RedisChatMemory(RedisTemplate<String, String> redisTemplate, 
                            ObjectMapper objectMapper) {
        this.redisTemplate = redisTemplate;
        this.objectMapper = objectMapper;
    }

    @Override
    public void add(String conversationId, List<Message> messages) {
        String key = buildKey(conversationId);
        
        // 序列化消息列表
        List<String> serialized = messages.stream()
            .map(this::serializeMessage)
            .filter(Objects::nonNull)
            .collect(Collectors.toList());
        
        if (serialized.isEmpty()) return;
        
        // 用 Redis List 存储，rightPush 保持顺序
        redisTemplate.opsForList().rightPushAll(key, serialized);
        
        // 刷新过期时间（每次活跃都重置）
        redisTemplate.expire(key, DEFAULT_TTL);
        
        // 限制历史长度，防止无限增长（保留最近 200 条消息）
        Long listSize = redisTemplate.opsForList().size(key);
        if (listSize != null && listSize > 200) {
            // 删掉最旧的消息，保留最新的 200 条
            redisTemplate.opsForList().trim(key, listSize - 200, -1);
        }
    }

    @Override
    public List<Message> get(String conversationId, int lastN) {
        String key = buildKey(conversationId);
        
        Long size = redisTemplate.opsForList().size(key);
        if (size == null || size == 0) {
            return Collections.emptyList();
        }
        
        // 获取最后 lastN 条
        long start = Math.max(0, size - lastN);
        List<String> serialized = redisTemplate.opsForList().range(key, start, -1);
        
        if (serialized == null || serialized.isEmpty()) {
            return Collections.emptyList();
        }
        
        return serialized.stream()
            .map(this::deserializeMessage)
            .filter(Objects::nonNull)
            .collect(Collectors.toList());
    }

    @Override
    public void clear(String conversationId) {
        redisTemplate.delete(buildKey(conversationId));
    }
    
    /**
     * 获取会话的消息总数（用于统计或分页展示）
     */
    public long getMessageCount(String conversationId) {
        Long size = redisTemplate.opsForList().size(buildKey(conversationId));
        return size != null ? size : 0;
    }
    
    /**
     * 获取会话历史（分页，用于历史记录展示功能）
     */
    public List<Message> getPage(String conversationId, int page, int pageSize) {
        String key = buildKey(conversationId);
        Long size = redisTemplate.opsForList().size(key);
        if (size == null || size == 0) return Collections.emptyList();
        
        long start = (long) page * pageSize;
        long end = start + pageSize - 1;
        
        List<String> serialized = redisTemplate.opsForList().range(key, start, end);
        if (serialized == null) return Collections.emptyList();
        
        return serialized.stream()
            .map(this::deserializeMessage)
            .filter(Objects::nonNull)
            .collect(Collectors.toList());
    }

    private String buildKey(String conversationId) {
        return KEY_PREFIX + conversationId;
    }

    private String serializeMessage(Message message) {
        try {
            MessageDTO dto = new MessageDTO();
            dto.setRole(message.getMessageType().getValue());
            dto.setContent(message.getContent());
            dto.setTimestamp(System.currentTimeMillis());
            return objectMapper.writeValueAsString(dto);
        } catch (JsonProcessingException e) {
            log.error("消息序列化失败", e);
            return null;
        }
    }

    private Message deserializeMessage(String json) {
        try {
            MessageDTO dto = objectMapper.readValue(json, MessageDTO.class);
            return switch (dto.getRole()) {
                case "user" -> new UserMessage(dto.getContent());
                case "assistant" -> new AssistantMessage(dto.getContent());
                case "system" -> new SystemMessage(dto.getContent());
                default -> {
                    log.warn("未知的消息角色: {}", dto.getRole());
                    yield null;
                }
            };
        } catch (JsonProcessingException e) {
            log.error("消息反序列化失败: {}", json, e);
            return null;
        }
    }
    
    @Data
    private static class MessageDTO {
        private String role;
        private String content;
        private long timestamp;
    }
}

配置 Bean：

@Configuration
public class ChatMemoryConfig {

    @Bean
    public ChatMemory chatMemory(RedisChatMemory redisChatMemory) {
        return redisChatMemory;
    }
    
    @Bean
    public PromptChatMemoryAdvisor promptChatMemoryAdvisor(ChatMemory chatMemory) {
        return new PromptChatMemoryAdvisor(chatMemory);
    }
}

使用：

@Service
public class ConversationalChatService {

    private final ChatClient chatClient;
    private final RedisChatMemory chatMemory;

    public String chat(String userId, String message) {
        // conversationId 按用户维度，同一用户多设备共享同一份历史
        String conversationId = "user:" + userId;
        
        return chatClient.prompt()
            .user(message)
            .advisors(spec -> spec
                .param(ChatMemory.CONVERSATION_ID, conversationId)
            )
            .call()
            .content();
    }
    
    /**
     * 获取对话历史（用于前端展示历史记录）
     */
    public List<MessageVO> getHistory(String userId, int page, int pageSize) {
        String conversationId = "user:" + userId;
        List<Message> messages = chatMemory.getPage(conversationId, page, pageSize);
        
        return messages.stream()
            .map(msg -> new MessageVO(
                msg.getMessageType().getValue(),
                msg.getContent()
            ))
            .collect(Collectors.toList());
    }
    
    /**
     * 开启新对话（清空历史）
     */
    public void startNewConversation(String userId) {
        chatMemory.clear("user:" + userId);
    }
}

ChatMemory 与 PromptChatMemoryAdvisor 的交互细节

很多人用 ChatMemory 只知道配置一下 PromptChatMemoryAdvisor，但不了解它内部是怎么把历史注入 prompt 的。

PromptChatMemoryAdvisor 有两种历史注入模式，通过构造参数控制：

// 模式一：把历史注入到 System Message 里（默认）
new PromptChatMemoryAdvisor(chatMemory)

// 模式二：把历史作为独立的 Message 列表注入
new MessageChatMemoryAdvisor(chatMemory)

两者的区别：

PromptChatMemoryAdvisor 把历史格式化成文本，追加到 system prompt 里，大概是这样：

[原始 system prompt]

以下是之前的对话记录：
用户：[历史消息1]
助手：[历史消息2]
用户：[历史消息3]
助手：[历史消息4]

MessageChatMemoryAdvisor 把历史直接作为独立的 Message 对象追加到消息列表里：

SystemMessage: [原始 system prompt]
UserMessage: [历史消息1]
AssistantMessage: [历史消息2]
UserMessage: [历史消息3]
AssistantMessage: [历史消息4]
UserMessage: [当前问题]

哪种更好？

MessageChatMemoryAdvisor 更符合多轮对话的标准格式，大多数模型对这种格式的理解更准确，因为它保留了消息的角色信息。

PromptChatMemoryAdvisor 更灵活，你可以自定义历史的格式化方式，适合需要对历史格式有特殊要求的场景。

我一般推荐用 MessageChatMemoryAdvisor。

ChatMemory 的 lastN 参数要怎么设置

get(conversationId, lastN) 这个 lastN 参数，决定了每次请求带入多少条历史记录。

这里有个隐含的 token 计算：

每一条历史消息都会占用 token。如果历史记录太长，加上当前 prompt 和用户问题，很容易超过模型的 context window 限制。

粗略估算（以 GPT-4o 的 128K context window 为例）：

系统 prompt：约 500 tokens
当前用户消息：约 200 tokens
每条历史消息：平均约 100 tokens
RAG 检索文档：约 2000 tokens

安全的历史条数：(128000 - 500 - 200 - 2000) / 100 ≈ 1253 条

但实际上不需要这么多，保留最近 20~40 条对话通常就够了，因为模型理解的是「近期上下文」，太久之前的历史对当前问题影响很小。

// 在 Advisor 里配置 lastN
new MessageChatMemoryAdvisor(chatMemory, 
    ChatMemory.CONVERSATION_ID,  // 参数名
    20  // 最多保留最近 20 条消息（10 轮对话）
)

对于 token 敏感的场景，可以动态计算 lastN：

@Service
public class DynamicMemoryService {

    private static final int MAX_CONTEXT_TOKENS = 4000;  // 留给历史的 token 预算
    private static final int AVG_TOKENS_PER_MESSAGE = 100;

    public int calculateLastN(String currentMessage, String systemPrompt) {
        int currentTokens = estimateTokens(currentMessage) + estimateTokens(systemPrompt);
        int remainingBudget = MAX_CONTEXT_TOKENS - currentTokens;
        return Math.max(2, remainingBudget / AVG_TOKENS_PER_MESSAGE);
    }
    
    private int estimateTokens(String text) {
        if (text == null) return 0;
        return (int) (text.length() / 1.5);  // 中文粗估
    }
}

会话 ID 的设计策略

conversationId 的设计很有讲究，不同业务场景选不同策略：

// 策略一：按用户维度（同一用户跨设备共享历史）
String conversationId = "user:" + userId;

// 策略二：按用户+设备维度（不同设备独立历史）
String conversationId = "user:" + userId + ":device:" + deviceId;

// 策略三：按用户+会话维度（用户可以开多个独立对话）
String conversationId = "session:" + sessionId;  // 前端生成的 sessionId

// 策略四：按用户+业务场景（同一用户在不同功能模块有独立上下文）
String conversationId = "user:" + userId + ":scene:" + sceneCode;
// 比如：user:123:scene:customer-service
//       user:123:scene:code-review

我们目前用的是策略四，不同业务模块之间的对话上下文完全隔离，避免上下文污染。

一个容易被忽视的问题：内存泄漏

用 Redis 存 ChatMemory 之后，如果不设合理的 TTL，对话历史会永久存在 Redis 里。

特别是有些场景：

游客用户（未登录）产生的对话历史
自动化测试产生的测试数据
已注销账号的历史数据

这些都会永久占用 Redis 内存。

解法：

设置合理的 TTL（上面代码里我设的是 7 天）
注销账号时清理历史
定期清理长期不活跃的会话

// 定期清理任务（每天凌晨 2 点运行）
@Scheduled(cron = "0 0 2 * * ?")
public void cleanupInactiveConversations() {
    // 查询超过 30 天没有活动的会话 ID，删除对应的 Redis 数据
    List<String> inactiveIds = conversationActivityMapper.findInactive(30);
    inactiveIds.forEach(id -> chatMemory.clear("user:" + id));
    log.info("清理了 {} 个不活跃会话", inactiveIds.size());
}

总结

ChatMemory 看起来是个小功能，但在生产环境里有很多需要注意的细节：

永远不要在多实例部署里用 InMemoryChatMemory，这是生产事故的来源
Redis 方案是标配，实现简单，效果好
lastN 要合理设置，太小上下文丢失，太大浪费 token
conversationId 的设计要贴合业务场景，不是一个 userId 打天下
一定要设 TTL 和清理机制，避免存储无限增长

多轮对话是 AI 应用的核心体验，ChatMemory 做得好不好，用户感知最直接。