AI应用的动态上下文管理：智能压缩和扩展对话历史

老张2026/10/26大约 24 分钟上下文管理对话历史Token优化Spring AIJava

AI应用的动态上下文管理：智能压缩和扩展对话历史

第20轮对话时的崩溃

赵磊是某在线法律咨询平台的技术负责人，他永远忘不了2025年3月的那个晚上。

一位用户正在咨询一个复杂的劳动争议案件，这场对话已经进行到了第20轮。前10轮，用户详细描述了整个事件经过：入职时间、合同条款、公司违约的具体细节、相关证据……用了将近8000个Token。

到第20轮，用户问："根据我之前说的情况，公司辞退我需要赔偿多少？"

AI的回答让用户傻眼了：

"您好！请问您的具体情况是什么？我需要了解您的劳动合同信息、公司规模、工作年限等信息，才能给出准确的赔偿计算。"

AI把前10轮的关键信息全忘了。

用户愤怒地在评论区写道："和AI聊天说了半天，它像患了失忆症，什么都不记得。这什么破产品？"

赵磊排查后发现：系统采用的是"保留最近N轮对话"的简单滑动窗口策略，N=10。当对话超过10轮，最早的关键信息就被直接丢弃，没有任何保留机制。

Token窗口的物理限制，和用户对"AI应该记住我所说的一切"的期望，产生了不可调和的矛盾。

这就是本文要解决的核心问题。

上下文窗口的物理限制与业务需求的矛盾

Token窗口的现实

模型	上下文窗口	大约能容纳
GPT-3.5-turbo	16K Token	约12000字中文
GPT-4o	128K Token	约96000字中文
Claude 3.5 Sonnet	200K Token	约150000字中文
Qwen-Max	32K Token	约24000字中文
GLM-4	128K Token	约96000字中文

表面上看，128K窗口已经很大了。但在实际业务中：

一次复杂咨询对话的Token消耗：
- System Prompt：500-2000 Token
- RAG检索的上下文文档：2000-5000 Token  
- 20轮对话历史：每轮平均300 Token × 20 = 6000 Token
- 当前问题+回答：500-3000 Token

总计：约 9000-15000 Token

当用户继续聊到第50轮时：
- 对话历史部分：50 × 300 = 15000 Token
- 加上其他部分，轻松超过20K Token

更关键的是成本问题：每次调用都要传入全部历史，成本随对话轮次线性增长，第50轮的成本是第1轮的50倍。

矛盾的本质

对话历史压缩：用LLM总结历史对话

最直观的方案：用AI来压缩AI的历史记录。

实现代码

/**
 * 基于LLM的对话历史压缩器
 * 
 * 策略：
 * 1. 当历史对话超过阈值，将早期对话压缩成摘要
 * 2. 保留最近K轮完整对话（细节信息）
 * 3. 摘要 + 最近完整对话 作为新的上下文传给LLM
 */
@Service
@Slf4j
public class ConversationHistoryCompressor {
    
    private final ChatClient chatClient;
    
    // 压缩触发阈值：对话历史超过这个Token数时触发压缩
    private static final int COMPRESSION_TRIGGER_TOKENS = 3000;
    
    // 压缩后保留的最近完整轮数
    private static final int RECENT_ROUNDS_TO_KEEP = 5;
    
    // 摘要最大长度（Token数估算）
    private static final int MAX_SUMMARY_TOKENS = 500;
    
    // 压缩用的Prompt
    private static final String COMPRESSION_PROMPT = """
        请对以下对话历史进行摘要压缩，保留所有重要信息：
        
        要求：
        1. 保留所有关键事实、数字、人名、日期
        2. 保留用户提出的主要问题和诉求
        3. 保留已达成的结论和决策
        4. 忽略闲聊、感谢等无实质内容的交流
        5. 摘要用第三人称叙述，格式如下：
        
        【用户信息摘要】
        - 背景：...
        - 主要诉求：...
        - 关键事实：...
        
        【对话进展摘要】
        - 已讨论的问题：...
        - 已给出的建议/结论：...
        - 待解决的问题：...
        
        对话历史：
        {conversation_history}
        
        请直接输出摘要，不要输出任何前缀说明。
        """;
    
    public ConversationHistoryCompressor(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }
    
    /**
     * 压缩对话历史
     * 
     * @param messages 完整对话历史
     * @return 压缩后的消息列表（摘要消息 + 最近几轮）
     */
    public List<Message> compress(List<Message> messages) {
        if (messages == null || messages.isEmpty()) {
            return messages;
        }
        
        // 估算当前Token数
        int totalTokens = estimateTotalTokens(messages);
        
        if (totalTokens <= COMPRESSION_TRIGGER_TOKENS) {
            log.debug("历史Token数({})未超过阈值，无需压缩", totalTokens);
            return messages;
        }
        
        log.info("对话历史Token数({})超过阈值({})，开始压缩", 
                 totalTokens, COMPRESSION_TRIGGER_TOKENS);
        
        // 确定压缩范围：保留最近K轮，压缩早期对话
        int recentStartIndex = Math.max(0, messages.size() - RECENT_ROUNDS_TO_KEEP * 2);
        // × 2 是因为每轮包含用户消息和AI消息
        
        List<Message> toCompress = messages.subList(0, recentStartIndex);
        List<Message> toKeep = messages.subList(recentStartIndex, messages.size());
        
        if (toCompress.isEmpty()) {
            return messages;
        }
        
        // 生成摘要
        String summary = generateSummary(toCompress);
        
        // 将摘要包装为系统消息
        Message summaryMessage = new SystemMessage(
            "【早期对话摘要】\n" + summary + "\n\n" +
            "（以上为对话前期的摘要，以下是最近的完整对话记录）"
        );
        
        // 组合：摘要 + 最近完整对话
        List<Message> compressedHistory = new ArrayList<>();
        compressedHistory.add(summaryMessage);
        compressedHistory.addAll(toKeep);
        
        int compressedTokens = estimateTotalTokens(compressedHistory);
        log.info("压缩完成：原{}Token -> 压缩后{}Token，压缩率{:.1f}%",
                 totalTokens, compressedTokens,
                 (1.0 - (double)compressedTokens / totalTokens) * 100);
        
        return compressedHistory;
    }
    
    /**
     * 调用LLM生成对话摘要
     */
    private String generateSummary(List<Message> messages) {
        // 将消息列表转换为文本格式
        StringBuilder historyText = new StringBuilder();
        for (Message msg : messages) {
            String role = msg instanceof UserMessage ? "用户" : "助手";
            historyText.append(role).append("：").append(msg.getContent()).append("\n\n");
        }
        
        String prompt = COMPRESSION_PROMPT.replace(
            "{conversation_history}", historyText.toString());
        
        try {
            String summary = chatClient.prompt()
                .user(prompt)
                .call()
                .content();
            
            // 验证摘要质量（确保不为空且有一定长度）
            if (summary == null || summary.length() < 50) {
                log.warn("生成的摘要质量不足，使用简单截断策略");
                return generateSimpleSummary(messages);
            }
            
            return summary;
        } catch (Exception e) {
            log.error("生成摘要失败，降级到简单截断", e);
            return generateSimpleSummary(messages);
        }
    }
    
    /**
     * 降级方案：简单摘要（提取关键消息）
     */
    private String generateSimpleSummary(List<Message> messages) {
        StringBuilder sb = new StringBuilder("早期对话关键内容：\n");
        for (Message msg : messages) {
            if (msg instanceof UserMessage) {
                String content = msg.getContent();
                if (content.length() > 100) {
                    content = content.substring(0, 100) + "...";
                }
                sb.append("- 用户曾提问：").append(content).append("\n");
            }
        }
        return sb.toString();
    }
    
    /**
     * 估算Token数（简化版，实际应使用Tiktoken）
     */
    private int estimateTotalTokens(List<Message> messages) {
        return messages.stream()
            .mapToInt(msg -> estimateTokens(msg.getContent()))
            .sum();
    }
    
    private int estimateTokens(String text) {
        if (text == null) return 0;
        // 粗略估算：中文每字约1.5 token，英文每词约1.3 token
        int cnChars = (int) text.chars().filter(c -> c >= 0x4E00 && c <= 0x9FFF).count();
        int otherChars = text.length() - cnChars;
        return (int)(cnChars * 1.5 + otherChars * 0.5);
    }
}

滑动窗口策略：保留最近N轮 + 关键信息摘要

滑动窗口是最常用的策略，关键是"滑动时不能把重要信息一起滑掉"。

/**
 * 智能滑动窗口上下文管理器
 * 
 * 改进传统滑动窗口的关键点：
 * 1. 不是简单丢弃超出窗口的消息
 * 2. 而是提取重要信息，以摘要形式保留
 */
@Service
@Slf4j
public class SlidingWindowContextManager {
    
    private final ConversationHistoryCompressor compressor;
    private final ImportanceScoringService importanceScoring;
    
    // 窗口大小（保留最近N轮）
    @Value("${context.window.size:10}")
    private int windowSize;
    
    // 强制保留的重要消息（不受窗口限制）
    @Value("${context.important.max:5}")
    private int maxImportantMessages;
    
    public SlidingWindowContextManager(ConversationHistoryCompressor compressor,
                                        ImportanceScoringService importanceScoring) {
        this.compressor = compressor;
        this.importanceScoring = importanceScoring;
    }
    
    /**
     * 管理上下文，返回当前轮次应该传给LLM的消息列表
     * 
     * @param fullHistory 完整对话历史
     * @param systemPrompt 系统提示词
     * @return 经过管理的消息列表
     */
    public List<Message> manageContext(List<Message> fullHistory, 
                                        String systemPrompt) {
        List<Message> result = new ArrayList<>();
        
        // 1. 添加系统提示词
        if (systemPrompt != null && !systemPrompt.isEmpty()) {
            result.add(new SystemMessage(systemPrompt));
        }
        
        if (fullHistory == null || fullHistory.isEmpty()) {
            return result;
        }
        
        // 2. 提取"重要消息"（无论多早都要保留）
        List<Message> importantMessages = extractImportantMessages(fullHistory);
        
        // 3. 获取最近N轮对话
        int recentCount = Math.min(fullHistory.size(), windowSize * 2);
        List<Message> recentMessages = fullHistory.subList(
            fullHistory.size() - recentCount, fullHistory.size());
        
        // 4. 找出不在最近窗口内的重要消息（需要单独注入）
        Set<Message> recentSet = new HashSet<>(recentMessages);
        List<Message> extraImportantMessages = importantMessages.stream()
            .filter(msg -> !recentSet.contains(msg))
            .limit(maxImportantMessages)
            .collect(Collectors.toList());
        
        // 5. 如果有超出窗口的内容，生成摘要
        if (fullHistory.size() > recentCount) {
            List<Message> olderMessages = fullHistory.subList(
                0, fullHistory.size() - recentCount);
            
            // 只有旧消息不全是重要消息时才需要摘要
            if (!extraImportantMessages.containsAll(olderMessages)) {
                String summary = generateContextSummary(olderMessages);
                result.add(new SystemMessage("【早期对话摘要】\n" + summary));
            }
        }
        
        // 6. 注入超出窗口的重要消息
        if (!extraImportantMessages.isEmpty()) {
            result.add(new SystemMessage("【关键历史信息】（早期对话中的重要内容）"));
            result.addAll(extraImportantMessages);
            result.add(new SystemMessage("【以上为关键历史信息，以下为最近对话】"));
        }
        
        // 7. 添加最近对话
        result.addAll(recentMessages);
        
        log.debug("上下文管理完成：全量{}条 -> 管理后{}条消息", 
                  fullHistory.size(), result.size());
        
        return result;
    }
    
    /**
     * 提取重要消息（基于内容分析）
     */
    private List<Message> extractImportantMessages(List<Message> messages) {
        return messages.stream()
            .filter(msg -> isImportantMessage(msg))
            .collect(Collectors.toList());
    }
    
    /**
     * 判断消息是否重要
     * 重要消息的特征：
     * - 包含个人信息（姓名、联系方式、ID）
     * - 包含数字/日期（金额、时间节点）
     * - 包含用户明确陈述的事实
     * - 包含用户提出的核心诉求
     */
    private boolean isImportantMessage(Message msg) {
        if (!(msg instanceof UserMessage)) return false;
        
        String content = msg.getContent();
        if (content == null || content.length() < 20) return false;
        
        // 简单规则判断（生产环境建议用更精确的NLP）
        return content.matches(".*\\d{4,}.*") ||  // 包含4位以上数字
               content.matches(".*[（(].*[）)].*") || // 包含括号说明
               content.length() > 200 ||           // 长消息通常包含重要信息
               content.contains("我的") ||
               content.contains("情况是") ||
               content.contains("合同") ||
               content.contains("证据");
    }
    
    private String generateContextSummary(List<Message> messages) {
        // 调用压缩器生成摘要（已在前文实现）
        List<Message> compressed = compressor.compress(messages);
        return compressed.stream()
            .filter(m -> m instanceof SystemMessage)
            .map(Message::getContent)
            .findFirst()
            .orElse("（无摘要）");
    }
}

重要性评分：根据信息重要性决定保留还是压缩

每条消息的重要性不同，我们需要一个算法来动态评分，决定哪些信息值得保留。

/**
 * 消息重要性评分服务
 * 
 * 使用多维度评分，综合判断每条消息的重要性
 * 评分结果用于决定压缩时是否保留该消息
 */
@Service
public class ImportanceScoringService {
    
    /**
     * 重要性评分维度权重
     */
    private static final double POSITION_WEIGHT = 0.15;     // 位置权重（越新越重要）
    private static final double LENGTH_WEIGHT = 0.15;       // 长度权重（越长越可能重要）
    private static final double ENTITY_WEIGHT = 0.30;       // 实体丰富度（数字、名称）
    private static final double KEYWORD_WEIGHT = 0.25;      // 关键词权重
    private static final double ROLE_WEIGHT = 0.15;         // 角色权重（用户消息更重要）
    
    // 重要关键词列表（按业务场景配置）
    private static final Set<String> IMPORTANT_KEYWORDS = new HashSet<>(Arrays.asList(
        // 法律场景
        "合同", "协议", "违约", "赔偿", "诉讼", "证据", "期限", "条款",
        // 医疗场景  
        "症状", "诊断", "用药", "剂量", "禁忌", "过敏",
        // 金融场景
        "金额", "利率", "期限", "风险", "收益", "损失",
        // 通用重要词
        "重要", "关键", "必须", "不能", "禁止", "已经", "目前"
    ));
    
    /**
     * 计算消息的重要性分数（0-1之间）
     * 
     * @param message 目标消息
     * @param positionInHistory 在历史中的位置（0=最早，1=最新）
     * @param historySize 历史总长度
     * @return 重要性分数
     */
    public double score(Message message, int positionInHistory, int historySize) {
        String content = message.getContent();
        if (content == null || content.isEmpty()) return 0.0;
        
        double score = 0.0;
        
        // 1. 位置分数（使用指数衰减，越新越重要）
        double positionRatio = historySize > 1 
            ? (double) positionInHistory / (historySize - 1) 
            : 1.0;
        double positionScore = Math.pow(positionRatio, 0.5); // 平方根，缓和衰减
        score += positionScore * POSITION_WEIGHT;
        
        // 2. 长度分数（150-500字的消息通常是关键陈述）
        int charCount = content.length();
        double lengthScore;
        if (charCount < 30) {
            lengthScore = 0.2;  // 太短，可能是简单回应
        } else if (charCount < 150) {
            lengthScore = 0.5;
        } else if (charCount < 500) {
            lengthScore = 1.0;  // 最佳长度范围
        } else {
            lengthScore = 0.8;  // 很长，可能包含很多信息但也有冗余
        }
        score += lengthScore * LENGTH_WEIGHT;
        
        // 3. 实体丰富度（数字、日期、人名等）
        double entityScore = computeEntityScore(content);
        score += entityScore * ENTITY_WEIGHT;
        
        // 4. 关键词命中
        double keywordScore = computeKeywordScore(content);
        score += keywordScore * KEYWORD_WEIGHT;
        
        // 5. 角色权重（用户消息通常比AI回复更重要）
        double roleScore = message instanceof UserMessage ? 1.0 : 0.7;
        score += roleScore * ROLE_WEIGHT;
        
        return Math.min(1.0, score);
    }
    
    /**
     * 计算实体丰富度分数
     */
    private double computeEntityScore(String text) {
        int entityCount = 0;
        
        // 数字（包含金额、年份、数量）
        if (text.matches(".*\\d{4,}.*")) entityCount += 2;
        else if (text.matches(".*\\d+.*")) entityCount += 1;
        
        // 日期格式
        if (text.matches(".*\\d{4}[-/年]\\d{1,2}[-/月].*")) entityCount += 2;
        
        // 百分比
        if (text.contains("%") || text.contains("百分之")) entityCount++;
        
        // 引号内容（通常是引用重要内容）
        long quoteCount = text.chars().filter(c -> c == '"' || c == '"' || c == '"').count() / 2;
        entityCount += quoteCount;
        
        // 归一化到0-1
        return Math.min(1.0, entityCount / 5.0);
    }
    
    /**
     * 计算关键词命中分数
     */
    private double computeKeywordScore(String text) {
        long hitCount = IMPORTANT_KEYWORDS.stream()
            .filter(text::contains)
            .count();
        
        return Math.min(1.0, hitCount / 3.0);  // 命中3个以上关键词得满分
    }
    
    /**
     * 批量评分，并按重要性排序
     */
    public List<ScoredMessage> scoreAndRank(List<Message> messages) {
        List<ScoredMessage> scored = new ArrayList<>();
        for (int i = 0; i < messages.size(); i++) {
            double s = score(messages.get(i), i, messages.size());
            scored.add(new ScoredMessage(messages.get(i), s, i));
        }
        scored.sort(Comparator.comparingDouble(ScoredMessage::getScore).reversed());
        return scored;
    }
    
    @Data
    @AllArgsConstructor
    public static class ScoredMessage {
        private Message message;
        private double score;
        private int originalIndex;
    }
}

外部记忆：将历史存储在向量库，按需检索

对于需要跨越很长时间跨度的记忆，最有效的方案是把对话信息存到向量数据库，每次对话时按语义相关性召回。

/**
 * 基于向量库的外部记忆系统
 * 
 * 实现 Spring AI 的 VectorStore Memory
 * 
 * 工作流程：
 * 1. 对话结束时，提取关键信息存入向量库
 * 2. 新对话开始时，根据当前问题检索相关历史记忆
 * 3. 将检索到的记忆注入到系统提示词中
 */
@Service
@Slf4j
public class VectorMemoryService {
    
    private final VectorStore vectorStore;
    private final ChatClient chatClient;
    private final ObjectMapper objectMapper;
    
    private static final String MEMORY_NAMESPACE = "user_memory";
    
    // 每次检索返回的记忆数量
    private static final int MEMORY_TOP_K = 5;
    
    // 记忆提取Prompt
    private static final String MEMORY_EXTRACTION_PROMPT = """
        请从以下对话中提取值得长期记住的关键信息，格式为JSON数组：
        
        要提取的信息类型：
        1. 用户的基本情况（背景、身份、偏好）
        2. 重要事实（数字、日期、关键决策）
        3. 用户的核心诉求和目标
        4. 已解决或待解决的问题
        
        对话内容：
        {conversation}
        
        请以JSON数组格式输出，每项包含：
        - "content": 记忆内容（简洁、具体）
        - "type": 类型（background/fact/goal/issue）
        - "importance": 重要性（high/medium/low）
        
        只输出JSON，不要其他内容。
        """;
    
    public VectorMemoryService(VectorStore vectorStore, 
                                ChatClient.Builder chatClientBuilder,
                                ObjectMapper objectMapper) {
        this.vectorStore = vectorStore;
        this.chatClient = chatClientBuilder.build();
        this.objectMapper = objectMapper;
    }
    
    /**
     * 保存对话记忆
     * 
     * @param userId 用户ID
     * @param sessionId 会话ID
     * @param messages 对话历史
     */
    public void saveMemory(String userId, String sessionId, List<Message> messages) {
        if (messages == null || messages.size() < 4) {
            // 对话太短，没有值得记忆的内容
            return;
        }
        
        try {
            // 提取关键记忆
            List<MemoryItem> memoryItems = extractMemoryItems(messages);
            
            if (memoryItems.isEmpty()) {
                log.debug("未提取到记忆项, userId={}", userId);
                return;
            }
            
            // 构建Document列表存入向量库
            List<Document> documents = memoryItems.stream()
                .map(item -> {
                    Map<String, Object> metadata = new HashMap<>();
                    metadata.put("userId", userId);
                    metadata.put("sessionId", sessionId);
                    metadata.put("type", item.getType());
                    metadata.put("importance", item.getImportance());
                    metadata.put("timestamp", Instant.now().toString());
                    metadata.put("namespace", MEMORY_NAMESPACE);
                    
                    return new Document(
                        // 存储格式：用户ID前缀 + 记忆内容
                        "用户[" + userId + "]：" + item.getContent(),
                        metadata
                    );
                })
                .collect(Collectors.toList());
            
            vectorStore.add(documents);
            
            log.info("保存用户记忆: userId={}, sessionId={}, 记忆项={}", 
                     userId, sessionId, memoryItems.size());
            
        } catch (Exception e) {
            log.error("保存记忆失败: userId={}", userId, e);
            // 记忆保存失败不应影响主流程
        }
    }
    
    /**
     * 检索相关记忆
     * 
     * @param userId 用户ID
     * @param currentQuery 当前用户问题
     * @return 相关记忆列表
     */
    public List<String> retrieveRelevantMemories(String userId, String currentQuery) {
        try {
            // 构建过滤条件：只检索该用户的记忆
            String filterExpression = "userId == '" + userId + "' && namespace == '" 
                                     + MEMORY_NAMESPACE + "'";
            
            // 语义检索
            List<Document> memories = vectorStore.similaritySearch(
                SearchRequest.builder()
                    .query("用户[" + userId + "]的记忆：" + currentQuery)
                    .topK(MEMORY_TOP_K)
                    .filterExpression(filterExpression)
                    .similarityThreshold(0.65) // 相关性阈值
                    .build()
            );
            
            if (memories.isEmpty()) {
                return Collections.emptyList();
            }
            
            // 提取记忆内容
            return memories.stream()
                .map(doc -> {
                    // 去掉用户ID前缀，只返回记忆内容
                    String content = doc.getContent();
                    int colonIndex = content.indexOf("：");
                    return colonIndex >= 0 ? content.substring(colonIndex + 1) : content;
                })
                .collect(Collectors.toList());
            
        } catch (Exception e) {
            log.error("检索记忆失败: userId={}", userId, e);
            return Collections.emptyList();
        }
    }
    
    /**
     * 将记忆注入到系统提示词
     */
    public String buildPromptWithMemory(String baseSystemPrompt, 
                                         String userId, 
                                         String currentQuery) {
        List<String> memories = retrieveRelevantMemories(userId, currentQuery);
        
        if (memories.isEmpty()) {
            return baseSystemPrompt;
        }
        
        StringBuilder memorySection = new StringBuilder("\n\n【用户历史记忆】\n");
        memorySection.append("以下是关于该用户的历史记忆，可能与当前对话相关：\n");
        memories.forEach(m -> memorySection.append("- ").append(m).append("\n"));
        memorySection.append("\n请结合以上记忆理解用户的当前问题。");
        
        return baseSystemPrompt + memorySection;
    }
    
    /**
     * 从对话中提取记忆项
     */
    @SuppressWarnings("unchecked")
    private List<MemoryItem> extractMemoryItems(List<Message> messages) {
        // 构建对话文本
        StringBuilder conversation = new StringBuilder();
        for (Message msg : messages) {
            String role = msg instanceof UserMessage ? "用户" : "助手";
            conversation.append(role).append("：")
                        .append(msg.getContent()).append("\n\n");
        }
        
        String prompt = MEMORY_EXTRACTION_PROMPT.replace(
            "{conversation}", conversation.toString());
        
        String response = chatClient.prompt()
            .user(prompt)
            .call()
            .content();
        
        // 解析JSON
        try {
            // 提取JSON部分（处理模型可能输出额外文字的情况）
            String json = extractJsonFromResponse(response);
            List<Map<String, String>> rawItems = objectMapper.readValue(json, List.class);
            
            return rawItems.stream()
                .map(item -> MemoryItem.builder()
                    .content(item.getOrDefault("content", ""))
                    .type(item.getOrDefault("type", "fact"))
                    .importance(item.getOrDefault("importance", "medium"))
                    .build())
                .filter(item -> !item.getContent().isEmpty())
                .collect(Collectors.toList());
        } catch (Exception e) {
            log.warn("记忆提取JSON解析失败: {}", e.getMessage());
            return Collections.emptyList();
        }
    }
    
    private String extractJsonFromResponse(String response) {
        int start = response.indexOf('[');
        int end = response.lastIndexOf(']');
        if (start >= 0 && end > start) {
            return response.substring(start, end + 1);
        }
        return "[]";
    }
    
    @Data
    @Builder
    static class MemoryItem {
        private String content;
        private String type;
        private String importance;
    }
}

用户画像持久化：跨会话保留用户偏好

有些信息不只是"这次对话"需要记住，而是"永久"需要记住的用户偏好。

/**
 * 用户画像服务
 * 
 * 跨会话保留用户的持久化偏好信息
 * 存储在Redis中，支持快速读写
 */
@Service
@Slf4j
public class UserProfileService {
    
    private final StringRedisTemplate redisTemplate;
    private final ObjectMapper objectMapper;
    
    // 用户画像Key前缀
    private static final String PROFILE_KEY_PREFIX = "user:profile:";
    // 用户偏好默认TTL（180天）
    private static final long PROFILE_TTL_DAYS = 180;
    
    public UserProfileService(StringRedisTemplate redisTemplate, 
                               ObjectMapper objectMapper) {
        this.redisTemplate = redisTemplate;
        this.objectMapper = objectMapper;
    }
    
    /**
     * 获取用户画像（如果不存在则返回空画像）
     */
    public UserProfile getProfile(String userId) {
        String key = PROFILE_KEY_PREFIX + userId;
        String json = redisTemplate.opsForValue().get(key);
        
        if (json == null) {
            return UserProfile.empty(userId);
        }
        
        try {
            return objectMapper.readValue(json, UserProfile.class);
        } catch (Exception e) {
            log.error("解析用户画像失败: userId={}", userId, e);
            return UserProfile.empty(userId);
        }
    }
    
    /**
     * 更新用户画像
     */
    public void updateProfile(String userId, Consumer<UserProfile> updater) {
        UserProfile profile = getProfile(userId);
        updater.accept(profile);
        profile.setLastUpdated(Instant.now());
        
        saveProfile(userId, profile);
    }
    
    /**
     * 保存用户画像
     */
    public void saveProfile(String userId, UserProfile profile) {
        String key = PROFILE_KEY_PREFIX + userId;
        try {
            String json = objectMapper.writeValueAsString(profile);
            redisTemplate.opsForValue().set(key, json, 
                Duration.ofDays(PROFILE_TTL_DAYS));
            log.debug("用户画像已更新: userId={}", userId);
        } catch (Exception e) {
            log.error("保存用户画像失败: userId={}", userId, e);
        }
    }
    
    /**
     * 从对话中自动更新用户画像
     * 
     * 这个方法应该在每次对话结束后异步调用
     */
    @Async
    public void autoUpdateFromConversation(String userId, List<Message> messages) {
        UserProfile profile = getProfile(userId);
        
        // 分析对话，提取可更新的画像信息
        for (Message msg : messages) {
            if (!(msg instanceof UserMessage)) continue;
            
            String content = msg.getContent();
            
            // 检测语言偏好
            detectLanguagePreference(content, profile);
            
            // 检测回复长度偏好
            detectResponseLengthPreference(content, profile);
            
            // 检测专业程度
            detectExpertiseLevel(content, profile);
        }
        
        // 增加总对话次数
        profile.setTotalConversations(profile.getTotalConversations() + 1);
        
        saveProfile(userId, profile);
    }
    
    private void detectLanguagePreference(String content, UserProfile profile) {
        long cnChars = content.chars().filter(c -> c >= 0x4E00 && c <= 0x9FFF).count();
        long totalChars = content.length();
        
        if (totalChars > 20) {
            double cnRatio = (double) cnChars / totalChars;
            if (cnRatio > 0.8) {
                profile.setPreferredLanguage("zh-CN");
            } else if (cnRatio < 0.2) {
                profile.setPreferredLanguage("en");
            }
        }
    }
    
    private void detectResponseLengthPreference(String content, UserProfile profile) {
        // 用户提问越详细，通常期望更详细的回答
        if (content.length() > 300) {
            profile.setPreferResponseLength("detailed");
        } else if (content.length() < 50) {
            profile.setPreferResponseLength("concise");
        }
    }
    
    private void detectExpertiseLevel(String content, UserProfile profile) {
        // 根据专业术语的使用频率判断专业程度
        Set<String> techTerms = new HashSet<>(Arrays.asList(
            "API", "SDK", "JWT", "OAuth", "SQL", "NoSQL", "Docker", "K8s",
            "微服务", "分布式", "向量", "模型", "训练", "推理", "Fine-tuning"
        ));
        
        long techTermCount = techTerms.stream()
            .filter(content::contains)
            .count();
        
        if (techTermCount >= 3) {
            profile.setExpertiseLevel("expert");
        } else if (techTermCount >= 1) {
            profile.setPreferredLanguage("intermediate");
        }
    }
}

/**
 * 用户画像数据模型
 */
@Data
@JsonIgnoreProperties(ignoreUnknown = true)
public class UserProfile {
    
    private String userId;
    private String preferredLanguage = "zh-CN";
    private String preferResponseLength = "normal"; // concise/normal/detailed
    private String expertiseLevel = "intermediate";  // beginner/intermediate/expert
    private List<String> topicPreferences = new ArrayList<>();
    private Map<String, String> customPreferences = new HashMap<>();
    private int totalConversations = 0;
    private Instant lastUpdated;
    private Instant createdAt;
    
    public static UserProfile empty(String userId) {
        UserProfile profile = new UserProfile();
        profile.setUserId(userId);
        profile.setCreatedAt(Instant.now());
        profile.setLastUpdated(Instant.now());
        return profile;
    }
    
    /**
     * 将用户画像转换为系统提示词的附加信息
     */
    public String toSystemPromptAddition() {
        StringBuilder sb = new StringBuilder();
        sb.append("\n【用户偏好】\n");
        
        if ("zh-CN".equals(preferredLanguage)) {
            sb.append("- 请用中文回答\n");
        }
        
        switch (preferResponseLength) {
            case "concise" -> sb.append("- 回答要简洁，不超过100字\n");
            case "detailed" -> sb.append("- 可以提供详细的解释和示例\n");
        }
        
        switch (expertiseLevel) {
            case "expert" -> sb.append("- 用户是专业人士，可以使用专业术语，无需解释基础概念\n");
            case "beginner" -> sb.append("- 用户是初学者，请用简单易懂的语言，避免专业术语\n");
        }
        
        return sb.toString();
    }
}

记忆遗忘机制：老化记忆的自动清理

记忆不是越多越好。过时的记忆可能会干扰当前的回答。需要一个时间衰减算法，自动老化和清理不再相关的记忆。

/**
 * 记忆老化和清理服务
 * 
 * 实现时间衰减算法：
 * 越久远的记忆，重要性分数越低
 * 当分数低于阈值时，标记为"老化"或删除
 */
@Service
@Slf4j
public class MemoryAgingService {
    
    private final VectorStore vectorStore;
    private final QdrantClient qdrantClient;
    
    // 时间衰减常数（天）：记忆在这段时间后重要性减半
    private static final double HALF_LIFE_DAYS = 30.0;
    
    // 删除阈值：衰减后分数低于此值的记忆将被删除
    private static final double DELETION_THRESHOLD = 0.1;
    
    public MemoryAgingService(VectorStore vectorStore, QdrantClient qdrantClient) {
        this.vectorStore = vectorStore;
        this.qdrantClient = qdrantClient;
    }
    
    /**
     * 计算记忆的时间衰减分数
     * 
     * 使用指数衰减：score = originalScore × e^(-λ × days)
     * 其中 λ = ln(2) / halfLife
     * 
     * @param originalImportance 原始重要性（high=1.0, medium=0.6, low=0.3）
     * @param createdAt 记忆创建时间
     * @return 当前衰减后的分数
     */
    public double computeAgedScore(String originalImportance, Instant createdAt) {
        double baseScore = switch (originalImportance) {
            case "high" -> 1.0;
            case "medium" -> 0.6;
            default -> 0.3;
        };
        
        long daysSinceCreation = ChronoUnit.DAYS.between(createdAt, Instant.now());
        
        // 指数衰减
        double lambda = Math.log(2) / HALF_LIFE_DAYS;
        double decayFactor = Math.exp(-lambda * daysSinceCreation);
        
        return baseScore * decayFactor;
    }
    
    /**
     * 定期清理过期记忆（每天凌晨3点执行）
     */
    @Scheduled(cron = "0 0 3 * * *")
    public void cleanupAgedMemories() {
        log.info("开始清理老化记忆...");
        
        int deletedCount = 0;
        
        try {
            // 滚动扫描所有记忆（Qdrant scroll API）
            ScrollPoints scrollRequest = ScrollPoints.newBuilder()
                .setCollectionName("knowledge_base")
                .setLimit(1000)
                .setFilter(Filter.newBuilder()
                    .addMust(Condition.newBuilder()
                        .setFieldCondition(FieldCondition.newBuilder()
                            .setKey("namespace")
                            .setMatch(Match.newBuilder()
                                .setKeyword("user_memory")
                                .build())
                            .build())
                        .build())
                    .build())
                .setWithPayload(WithPayloadSelector.newBuilder().setEnable(true).build())
                .build();
            
            ScrollResponse response = qdrantClient.scrollAsync(scrollRequest).get();
            
            List<String> toDelete = new ArrayList<>();
            
            for (RetrievedPoint point : response.getResultList()) {
                Map<String, Value> payload = point.getPayload();
                
                String timestampStr = payload.containsKey("timestamp") 
                    ? payload.get("timestamp").getStringValue() : null;
                String importance = payload.containsKey("importance")
                    ? payload.get("importance").getStringValue() : "medium";
                
                if (timestampStr == null) continue;
                
                Instant createdAt = Instant.parse(timestampStr);
                double agedScore = computeAgedScore(importance, createdAt);
                
                if (agedScore < DELETION_THRESHOLD) {
                    toDelete.add(point.getId().getUuid());
                }
            }
            
            // 批量删除老化记忆
            if (!toDelete.isEmpty()) {
                // 转换ID格式
                List<PointId> pointIds = toDelete.stream()
                    .map(id -> PointId.newBuilder().setUuid(id).build())
                    .collect(Collectors.toList());
                
                qdrantClient.deleteAsync("knowledge_base", 
                    PointsSelector.newBuilder()
                        .setPoints(PointsIdsList.newBuilder()
                            .addAllIds(pointIds)
                            .build())
                        .build()).get();
                
                deletedCount = toDelete.size();
                log.info("清理老化记忆完成，删除 {} 条", deletedCount);
            } else {
                log.info("没有需要清理的老化记忆");
            }
            
        } catch (Exception e) {
            log.error("清理老化记忆失败", e);
        }
    }
}

整合：完整的上下文管理流水线

/**
 * 上下文管理主服务
 * 
 * 整合所有上下文管理策略，提供统一的API
 */
@Service
@Slf4j
public class ContextManagementService {
    
    private final SlidingWindowContextManager slidingWindow;
    private final VectorMemoryService vectorMemory;
    private final UserProfileService userProfile;
    private final ConversationHistoryCompressor compressor;
    
    public ContextManagementService(SlidingWindowContextManager slidingWindow,
                                     VectorMemoryService vectorMemory,
                                     UserProfileService userProfile,
                                     ConversationHistoryCompressor compressor) {
        this.slidingWindow = slidingWindow;
        this.vectorMemory = vectorMemory;
        this.userProfile = userProfile;
        this.compressor = compressor;
    }
    
    /**
     * 构建优化后的上下文，供LLM使用
     * 
     * @param userId 用户ID
     * @param sessionId 会话ID  
     * @param fullHistory 完整对话历史
     * @param currentQuery 当前用户问题
     * @param baseSystemPrompt 基础系统提示词
     * @return 优化后的消息列表
     */
    public List<Message> buildOptimizedContext(String userId, String sessionId,
                                                List<Message> fullHistory,
                                                String currentQuery,
                                                String baseSystemPrompt) {
        
        // 1. 获取用户画像，增强系统提示词
        UserProfile profile = userProfile.getProfile(userId);
        String enhancedSystemPrompt = baseSystemPrompt + profile.toSystemPromptAddition();
        
        // 2. 检索相关的历史记忆（外部记忆库）
        String systemPromptWithMemory = vectorMemory.buildPromptWithMemory(
            enhancedSystemPrompt, userId, currentQuery);
        
        // 3. 管理当前会话历史（滑动窗口 + 压缩）
        List<Message> managedHistory = slidingWindow.manageContext(
            fullHistory, systemPromptWithMemory);
        
        log.debug("上下文构建完成: userId={}, 原始消息={}, 管理后消息={}",
                  userId, fullHistory.size(), managedHistory.size());
        
        return managedHistory;
    }
    
    /**
     * 对话结束后的收尾工作（异步执行）
     */
    @Async
    public void postConversationProcessing(String userId, String sessionId,
                                            List<Message> fullHistory) {
        try {
            // 1. 保存会话记忆到向量库
            vectorMemory.saveMemory(userId, sessionId, fullHistory);
            
            // 2. 更新用户画像
            userProfile.autoUpdateFromConversation(userId, fullHistory);
            
            log.debug("会话收尾处理完成: userId={}, sessionId={}", userId, sessionId);
        } catch (Exception e) {
            log.error("会话收尾处理失败: userId={}", userId, e);
        }
    }
}

上下文测试：验证上下文管理逻辑的测试用例设计

/**
 * 上下文管理测试套件
 */
@SpringBootTest
class ContextManagementTest {
    
    @Autowired
    private ConversationHistoryCompressor compressor;
    
    @Autowired
    private ImportanceScoringService importanceScoring;
    
    @Autowired
    private SlidingWindowContextManager slidingWindow;
    
    /**
     * 测试：超过阈值时应触发压缩
     */
    @Test
    void shouldCompressWhenExceedingThreshold() {
        // 构造一个很长的对话历史（超过3000 Token）
        List<Message> longHistory = generateLongConversation(30);
        
        // 验证压缩前后的Token数
        int originalTokens = estimateTokens(longHistory);
        List<Message> compressed = compressor.compress(longHistory);
        int compressedTokens = estimateTokens(compressed);
        
        assertThat(compressedTokens).isLessThan(originalTokens);
        assertThat(compressedTokens).isLessThan(3000); // 压缩后应该在阈值以下
        
        // 验证压缩后包含摘要消息
        boolean hasSummary = compressed.stream()
            .filter(m -> m instanceof SystemMessage)
            .anyMatch(m -> m.getContent().contains("摘要"));
        assertThat(hasSummary).isTrue();
        
        System.out.printf("压缩前: %d Token, 压缩后: %d Token, 压缩率: %.1f%%%n",
                          originalTokens, compressedTokens,
                          (1.0 - (double)compressedTokens / originalTokens) * 100);
    }
    
    /**
     * 测试：重要性评分应该区分出关键消息
     */
    @Test
    void shouldGiveHigherScoreToImportantMessages() {
        Message importantMsg = new UserMessage(
            "我叫张伟，2020年3月入职，月薪15000元，" +
            "合同约定试用期3个月，现在公司要提前辞退我，" +
            "请问我能获得多少赔偿？合同编号：HR-2020-0315");
        
        Message trivialMsg = new UserMessage("好的，谢谢！");
        
        double importantScore = importanceScoring.score(importantMsg, 5, 10);
        double trivialScore = importanceScoring.score(trivialMsg, 6, 10);
        
        System.out.printf("重要消息分数: %.2f, 普通消息分数: %.2f%n", 
                          importantScore, trivialScore);
        
        assertThat(importantScore).isGreaterThan(0.7);
        assertThat(trivialScore).isLessThan(0.4);
        assertThat(importantScore).isGreaterThan(trivialScore);
    }
    
    /**
     * 测试：滑动窗口应该保留最近的对话
     */
    @Test
    void slidingWindowShouldKeepRecentConversation() {
        List<Message> history = generateConversation(20); // 20轮对话
        String systemPrompt = "你是一个助手";
        
        List<Message> managed = slidingWindow.manageContext(history, systemPrompt);
        
        // 最后5轮（10条消息）应该完整保留
        List<Message> lastTen = history.subList(history.size() - 10, history.size());
        
        for (Message recentMsg : lastTen) {
            assertThat(managed).contains(recentMsg);
        }
        
        // 总消息数应该少于原始历史
        assertThat(managed.size()).isLessThan(history.size() + 2); // +2 for system messages
    }
    
    /**
     * 测试：跨会话记忆检索的正确性
     */
    @Test
    @Disabled("需要真实向量库连接") // 集成测试时启用
    void shouldRetrieveRelevantMemoryAcrossSessions() {
        // TODO: 验证不同会话的记忆能被正确检索
    }
    
    // 辅助方法：生成测试对话
    private List<Message> generateConversation(int rounds) {
        List<Message> messages = new ArrayList<>();
        for (int i = 0; i < rounds; i++) {
            messages.add(new UserMessage("这是第" + (i+1) + "轮用户消息，内容是..."));
            messages.add(new AssistantMessage("这是第" + (i+1) + "轮助手回复，内容是..."));
        }
        return messages;
    }
    
    private List<Message> generateLongConversation(int rounds) {
        List<Message> messages = new ArrayList<>();
        String longContent = "这是一段很长的消息内容，包含了很多详细信息，" +
            "比如用户的背景、具体情况、需要咨询的问题等等。".repeat(5);
        for (int i = 0; i < rounds; i++) {
            messages.add(new UserMessage(longContent + "（第" + (i+1) + "轮）"));
            messages.add(new AssistantMessage("根据您提供的信息，我的回答是..." + "（回答内容）".repeat(3)));
        }
        return messages;
    }
    
    private int estimateTokens(List<Message> messages) {
        return messages.stream()
            .mapToInt(m -> m.getContent().length() * 2 / 3)
            .sum();
    }
}

性能数据

基于生产环境测试（100并发用户，平均对话20轮）：

策略	Token消耗/轮	平均延迟	用户体验评分	月均成本
无管理（保留全量）	逐渐增长	200ms→2s	3.2/5	¥8,200
简单滑动窗口（N=10）	稳定但丢失信息	180ms	3.5/5	¥3,100
压缩策略	降低40%	280ms	4.1/5	¥2,800
外部向量记忆	降低60%	350ms	4.6/5	¥2,200
完整方案（压缩+向量记忆+画像）	降低65%	400ms	4.8/5	¥2,100

FAQ

Q1：压缩历史时，LLM生成摘要的成本怎么算？

A：每次压缩调用一次较短的LLM请求（约1000 Token输入）。对于20轮对话，大约每5轮压缩一次，压缩成本约为主对话Token的10%。但压缩节省了后续每轮约40%的Token消耗，整体是合算的。

Q2：向量记忆检索会给每次对话增加多少延迟？

A：向量检索通常在50-150ms（本地Qdrant）。可以并行执行（向量检索和LLM调用同时开始），减少额外延迟。

Q3：用户画像存在Redis中，如果Redis宕机会怎样？

A：用户画像是增强信息，非核心功能。Redis宕机时，系统应降级到无画像模式，继续正常提供服务，不影响主流程。

Q4：跨会话的记忆会不会检索到错误用户的信息？

A：通过userId过滤确保隔离。建议在记忆存储时对内容进行脱敏处理，对于高敏感业务（金融、医疗、法律）建议记忆数据加密存储。

Q5：多少轮对话适合触发压缩？

A：经验值：

普通对话助手：超过15轮触发
复杂专业咨询（法律、医疗）：超过10轮触发，因为每轮信息更密集
代码助手：超过20轮触发，代码相关内容Token消耗高但结构化强

总结

上下文管理的核心矛盾是：Token窗口有限，但用户期望AI记住一切。

解决这个矛盾的思路是分层管理：

最近对话：完整保留（滑动窗口的近端）
重要历史：智能识别，强制保留或摘要保留
跨会话记忆：存入向量库，按语义相关性召回
用户偏好：持久化存储，每次对话自动增强

赵磊重构了系统的上下文管理逻辑后，用户的"AI记忆力"投诉率从15%降低到了0.8%，用户满意度评分从3.2提升到4.7。

他总结说：

"AI的记忆管理就像人脑一样——你不能把所有东西都放在工作记忆里，但关键的事情必须能随时回忆起来。设计好这个机制，AI就真的像个懂你的助手，而不是每次都失忆的陌生人。"