第1643篇：AI驱动的智能客服系统——从意图识别到多轮对话的完整链路

老张2026/4/30大约 11 分钟

第1643篇：AI驱动的智能客服系统——从意图识别到多轮对话的完整链路

做智能客服这个方向，我已经折腾了快两年了。期间踩的坑，比喝的水还多。

最早的版本是用关键词匹配做意图识别，上线两周就被我们自己的客服总监怼了："这玩意儿还不如我们以前的FAQ，用户问个退款问题，给人家推荐了怎么充值，丢死人了。"那一刻我真的挺惭愧的。

后来大模型出来了，重新做了一遍。今天把整个技术演进和最终方案完整讲一讲。

智能客服的核心难点到底在哪

在做技术方案之前，必须想清楚智能客服难在哪。

很多人以为就是个"问答系统"，其实差远了。用户跟客服对话，经常是这样的：

"我上次买的那个，能退吗？"（需要联系上下文，还需要查订单）
"怎么这么贵" + "算了算了，还是给我换一个红色的"（情绪+意图切换）
"我朋友说你们有活动" + "诶对了我还想问一下之前的问题"（话题跳转）

这些对话模式，用传统的意图分类完全搞不定。必须有多轮对话管理能力，而且要能联系用户的实际账号数据（订单、积分、账户）。

整体架构图：

会话管理：状态是核心

多轮对话最核心的是状态管理。一个会话的状态应该包含什么？

@Data
@Builder
public class ConversationSession {
    private String sessionId;
    private String userId;
    private String channelType;  // wechat/app/web
    
    // 对话历史（滑动窗口，保留最近N轮）
    private List<Message> messageHistory;
    
    // 意图追踪
    private String currentIntent;     // 当前意图
    private String previousIntent;    // 上一轮意图
    private Map<String, Object> intentSlots;  // 意图槽位（如订单号、商品ID）
    
    // 用户状态
    private EmotionState emotionState;  // 情绪状态
    private int frustrationLevel;       // 0-10，挫败感越高越需要转人工
    
    // 业务上下文
    private String relatedOrderId;     // 关联订单
    private String relatedProductId;   // 关联商品
    
    // 会话元信息
    private Instant startTime;
    private Instant lastActiveTime;
    private int turnCount;             // 已对话轮数
    private boolean transferredToHuman;
    
    // 未解决的问题栈（用户可能同时提多个问题）
    private Deque<PendingIssue> pendingIssues;
}

会话状态存在 Redis 里，过期时间30分钟（30分钟没说话算会话结束）：

@Service
public class SessionManager {
    
    @Autowired
    private RedisTemplate<String, Object> redisTemplate;
    
    private static final Duration SESSION_TTL = Duration.ofMinutes(30);
    private static final int MAX_HISTORY_TURNS = 10;  // 保留最近10轮
    
    public ConversationSession getOrCreate(String sessionId, String userId) {
        String key = "session:" + sessionId;
        ConversationSession session = (ConversationSession) redisTemplate
                .opsForValue().get(key);
        
        if (session == null) {
            session = ConversationSession.builder()
                    .sessionId(sessionId)
                    .userId(userId)
                    .messageHistory(new ArrayList<>())
                    .intentSlots(new HashMap<>())
                    .emotionState(EmotionState.NEUTRAL)
                    .frustrationLevel(0)
                    .startTime(Instant.now())
                    .lastActiveTime(Instant.now())
                    .turnCount(0)
                    .pendingIssues(new ArrayDeque<>())
                    .build();
        }
        
        return session;
    }
    
    public void addMessage(ConversationSession session, 
                           String role, String content) {
        Message msg = new Message(role, content, Instant.now());
        session.getMessageHistory().add(msg);
        
        // 滑动窗口：超过最大轮数就删掉最早的
        while (session.getMessageHistory().size() > MAX_HISTORY_TURNS * 2) {
            session.getMessageHistory().remove(0);
        }
        
        session.setTurnCount(session.getTurnCount() + 1);
        session.setLastActiveTime(Instant.now());
        
        save(session);
    }
    
    public void save(ConversationSession session) {
        String key = "session:" + session.getSessionId();
        redisTemplate.opsForValue().set(key, session, SESSION_TTL);
    }
}

意图识别：不要只靠分类模型

意图识别我经历了三个阶段：

第一阶段：关键词匹配（弃用）上面说了，这个太蠢了不提了。

第二阶段：Fine-tuned BERT分类 用BERT做意图分类，效果好很多，但有个致命问题：意图种类超过100个之后，维护成本爆炸式增长，每次加新意图都要重新训练，运营团队苦不堪言。

第三阶段：大模型 + 规则混合（当前方案）

核心思路：用大模型做意图理解，但不让大模型"自由生成"意图，而是给它一个意图候选列表，让它做选择：

@Service
public class IntentRecognitionService {
    
    @Autowired
    private ChatClient chatClient;
    
    @Autowired
    private IntentRepository intentRepository;
    
    public IntentResult recognize(ConversationSession session, String userMessage) {
        // 加载当前业务线的意图列表
        List<IntentDefinition> intents = intentRepository.findByBusinessLine(
            session.getBusinessLine()
        );
        
        String intentListStr = intents.stream()
                .map(i -> String.format("- %s：%s（示例：%s）",
                    i.getIntentCode(), i.getDescription(), i.getExamples()))
                .collect(Collectors.joining("\n"));
        
        // 构建历史上下文摘要
        String historyContext = buildHistoryContext(session.getMessageHistory());
        
        String prompt = """
                你是一个意图识别引擎，需要从候选意图列表中识别用户当前的意图。
                
                【对话历史摘要】
                %s
                
                【用户当前消息】
                %s
                
                【候选意图列表】
                %s
                
                请返回JSON格式：
                {
                  "intent": "意图代码（从列表中选择，不确定时选OTHER）",
                  "confidence": 0.0-1.0,
                  "slots": {"槽位名": "槽位值"},
                  "emotion": "POSITIVE/NEUTRAL/NEGATIVE/ANGRY",
                  "needsClarification": true/false,
                  "clarificationQuestion": "如果需要澄清，这里写澄清问题"
                }
                
                注意：如果用户消息包含多个意图，取主意图。
                """.formatted(historyContext, userMessage, intentListStr);
        
        String response = chatClient.prompt()
                .user(prompt)
                .call()
                .content();
        
        IntentResult result = parseIntentResult(response);
        
        // 更新会话的意图状态
        updateSessionIntent(session, result);
        
        return result;
    }
    
    private String buildHistoryContext(List<Message> history) {
        if (history.isEmpty()) return "（新会话，无历史）";
        
        // 只取最后3轮
        int start = Math.max(0, history.size() - 6);
        return history.subList(start, history.size()).stream()
                .map(m -> m.getRole() + ": " + m.getContent())
                .collect(Collectors.joining("\n"));
    }
}

这个方案有几个优点：

新增意图只需要在数据库加一条记录，不用重新训练
大模型能理解上下文，所以"能退吗"这种省略问法也能正确识别
情绪识别顺带做了，不需要额外的情绪分类器

知识库检索：RAG的工程细节

客服知识库通常有几千到几万条FAQ，还有产品手册、政策文档等。全部放进Prompt里肯定不现实，要用RAG。

但RAG有个经典问题：检索出来的内容不一定是最相关的。我踩过的坑：

用户问"快递几天到"，检索出来的是"快递损坏如何赔偿"的文档，因为两者都包含"快递"这个词，向量相似度还不低。

解决方案是混合检索：向量相似度 + BM25关键词匹配，两个分数加权融合：

@Service
public class KnowledgeRetrievalService {
    
    @Autowired
    private VectorStore vectorStore;
    
    @Autowired
    private ElasticsearchClient esClient;
    
    public List<KnowledgeChunk> retrieve(
            String query, 
            String businessLine,
            IntentResult intent) {
        
        // 1. 向量检索（语义相关）
        List<Document> vectorResults = vectorStore.similaritySearch(
            SearchRequest.query(query)
                .withTopK(10)
                .withFilterExpression("businessLine == '" + businessLine + "'")
                .withSimilarityThreshold(0.6)
        );
        
        // 2. BM25关键词检索
        List<KnowledgeChunk> bm25Results = bm25Search(query, businessLine, 10);
        
        // 3. 融合排序（RRF算法：倒数排名融合）
        Map<String, Double> scoreMap = new HashMap<>();
        
        for (int i = 0; i < vectorResults.size(); i++) {
            String id = vectorResults.get(i).getId();
            scoreMap.merge(id, 1.0 / (i + 60), Double::sum);
        }
        
        for (int i = 0; i < bm25Results.size(); i++) {
            String id = bm25Results.get(i).getId();
            scoreMap.merge(id, 1.0 / (i + 60), Double::sum);
        }
        
        // 4. 按意图做过滤加权
        if (intent != null && !intent.getIntentCode().equals("OTHER")) {
            boostByIntent(scoreMap, intent.getIntentCode());
        }
        
        // 取Top5
        return scoreMap.entrySet().stream()
                .sorted(Map.Entry.<String, Double>comparingByValue().reversed())
                .limit(5)
                .map(e -> getChunkById(e.getKey()))
                .filter(Objects::nonNull)
                .collect(Collectors.toList());
    }
    
    private void boostByIntent(Map<String, Double> scoreMap, String intentCode) {
        // 如果是退款意图，提升退款相关文档的分数
        List<String> intentRelatedIds = knowledgeRepo.findIdsByIntent(intentCode);
        for (String id : intentRelatedIds) {
            scoreMap.computeIfPresent(id, (k, v) -> v * 1.5);
        }
    }
}

回复生成：不能让大模型"自由发挥"

这是客服系统里最容易出问题的地方。大模型有时候会"幻觉"，编造不存在的政策。比如用户问"你们支持7天无理由退货吗"，大模型可能回答"是的，支持30天无理由退货"，把错误的信息告诉用户，后果很严重。

我的解决方案是严格的事实约束：

@Service
public class ResponseGenerationService {
    
    @Autowired
    private ChatClient chatClient;
    
    public GeneratedResponse generate(
            ConversationSession session,
            String userMessage,
            IntentResult intent,
            List<KnowledgeChunk> knowledgeContext,
            UserProfile userProfile) {
        
        String systemPrompt = buildSystemPrompt(session.getBusinessLine());
        String userPrompt = buildUserPrompt(
            userMessage, intent, knowledgeContext, userProfile, session
        );
        
        ChatResponse response = chatClient.prompt()
                .system(systemPrompt)
                .user(userPrompt)
                .call()
                .chatResponse();
        
        String rawReply = response.getResult().getOutput().getContent();
        
        // 安全检查：确保回复不包含知识库之外的事实承诺
        if (containsUnverifiedClaim(rawReply, knowledgeContext)) {
            // 降级到模板回复
            return fallbackToTemplate(intent);
        }
        
        return GeneratedResponse.builder()
                .content(rawReply)
                .suggestedActions(parseSuggestedActions(rawReply))
                .build();
    }
    
    private String buildSystemPrompt(String businessLine) {
        return """
                你是%s的智能客服助手，名字叫小智。
                
                【行为准则】
                1. 只根据提供的知识库内容回答，不要编造任何政策、规则或数据
                2. 如果知识库中没有相关信息，直接说"这个问题我需要帮您转接人工客服确认"
                3. 回答简洁友好，不超过200字，避免废话
                4. 不要重复用户的问题
                5. 如果用户情绪激动，先表达理解，再解决问题
                6. 涉及金额、时间的信息，必须和知识库完全一致，不要自己推算
                
                【绝对禁止】
                - 承诺知识库中未明确说明的优惠或政策
                - 透露内部系统信息
                - 评价竞争对手
                - 讨论政治、宗教等敏感话题
                """.formatted(businessLine);
    }
    
    private String buildUserPrompt(
            String userMessage,
            IntentResult intent,
            List<KnowledgeChunk> knowledge,
            UserProfile userProfile,
            ConversationSession session) {
        
        StringBuilder sb = new StringBuilder();
        
        // 用户信息
        sb.append("【用户信息】\n");
        sb.append("会员等级：").append(userProfile.getMemberLevel()).append("\n");
        if (session.getRelatedOrderId() != null) {
            sb.append("关联订单：").append(session.getRelatedOrderId()).append("\n");
        }
        
        // 知识库内容
        sb.append("\n【参考知识库（只能基于此回答）】\n");
        for (KnowledgeChunk chunk : knowledge) {
            sb.append("---\n").append(chunk.getContent()).append("\n");
        }
        
        // 用户问题
        sb.append("\n【用户当前问题】\n").append(userMessage);
        
        // 意图提示
        if (intent != null && !intent.getIntentCode().equals("OTHER")) {
            sb.append("\n\n（识别到用户意图：").append(intent.getDescription()).append("）");
        }
        
        return sb.toString();
    }
}

转人工的时机判断

这是个很有意思的问题：什么时候该把用户转给人工客服？

太早转：AI没发挥作用，人工客服压力大太晚转：用户等太久，已经火冒三丈

我设计了一个多维度的转人工判断器：

@Service
public class TransferJudgmentService {
    
    public TransferDecision judge(ConversationSession session, 
                                   String userMessage,
                                   IntentResult intent) {
        List<TransferReason> reasons = new ArrayList<>();
        int transferScore = 0;
        
        // 1. 强制转人工的场景
        if (isForceTransferIntent(intent.getIntentCode())) {
            return TransferDecision.immediate("用户主动要求人工服务");
        }
        
        // 2. 情绪判断
        if (session.getFrustrationLevel() >= 7) {
            transferScore += 40;
            reasons.add(TransferReason.HIGH_FRUSTRATION);
        } else if (intent.getEmotion() == EmotionState.ANGRY) {
            transferScore += 25;
            reasons.add(TransferReason.USER_ANGRY);
        }
        
        // 3. 问题复杂度
        if (intent.getConfidence() < 0.5) {
            transferScore += 20;
            reasons.add(TransferReason.LOW_INTENT_CONFIDENCE);
        }
        
        // 4. 多次未解决
        if (session.getTurnCount() > 8 && session.getPendingIssues().size() > 0) {
            transferScore += 30;
            reasons.add(TransferReason.UNRESOLVED_LONG_CONVERSATION);
        }
        
        // 5. 涉及特定高风险意图
        if (isHighRiskIntent(intent.getIntentCode())) {
            transferScore += 35;
            reasons.add(TransferReason.HIGH_RISK_INTENT);
        }
        
        // 6. 用户消息包含负面词
        if (containsEscalationKeywords(userMessage)) {
            transferScore += 20;
            reasons.add(TransferReason.ESCALATION_KEYWORDS);
        }
        
        if (transferScore >= 50) {
            return TransferDecision.transfer(reasons);
        } else if (transferScore >= 30) {
            return TransferDecision.suggest(reasons); // 主动提示用户可转人工
        }
        
        return TransferDecision.noTransfer();
    }
    
    private boolean containsEscalationKeywords(String message) {
        List<String> keywords = List.of(
            "投诉", "举报", "曝光", "律师", "消协", 
            "退款退款", "骗子", "不给退", "太差了"
        );
        return keywords.stream().anyMatch(message::contains);
    }
    
    private boolean isHighRiskIntent(String intentCode) {
        return Set.of(
            "COMPLAINT_ESCALATION", 
            "LEGAL_THREAT",
            "LARGE_AMOUNT_DISPUTE"  // 大额纠纷
        ).contains(intentCode);
    }
}

挫败感评分怎么计算？我用了这样的规则：

用户重复问同一个问题：+2分
AI回复包含"抱歉"：+1分
用户使用感叹号：+1分
用户发送短消息（少于10字）且不是正常的简短回复：+1分（通常是在赌气）
成功解决一个问题：-3分

性能优化

智能客服对响应时间要求很高，用户期望2秒内得到回复。全链路走下来（意图识别 + RAG检索 + 大模型生成），最初要5-8秒，完全不能接受。

优化思路：

1. 意图识别和知识库检索并行

public ConversationResult process(String sessionId, String userId, String message) {
    ConversationSession session = sessionManager.getOrCreate(sessionId, userId);
    
    // 意图识别和用户信息查询并行
    CompletableFuture<IntentResult> intentFuture = CompletableFuture.supplyAsync(
        () -> intentService.recognize(session, message)
    );
    CompletableFuture<UserProfile> profileFuture = CompletableFuture.supplyAsync(
        () -> userService.getProfile(userId)
    );
    
    IntentResult intent = intentFuture.get();
    UserProfile profile = profileFuture.get();
    
    // 基于意图并行检索知识库和业务数据
    CompletableFuture<List<KnowledgeChunk>> knowledgeFuture = 
        CompletableFuture.supplyAsync(
            () -> knowledgeService.retrieve(message, session.getBusinessLine(), intent)
        );
    CompletableFuture<BusinessContext> bizFuture = 
        CompletableFuture.supplyAsync(
            () -> bizService.getContext(userId, intent)
        );
    
    List<KnowledgeChunk> knowledge = knowledgeFuture.get();
    BusinessContext bizContext = bizFuture.get();
    
    // 生成回复
    return responseService.generate(session, message, intent, knowledge, profile, bizContext);
}

2. 流式输出

大模型生成是整个链路最慢的，用流式输出（SSE）让用户感知更好：

public SseEmitter streamResponse(String sessionId, String userId, String message) {
    SseEmitter emitter = new SseEmitter(30000L);
    
    executorService.submit(() -> {
        try {
            // 前置处理
            ConversationSession session = sessionManager.getOrCreate(sessionId, userId);
            IntentResult intent = intentService.recognize(session, message);
            List<KnowledgeChunk> knowledge = knowledgeService.retrieve(
                message, session.getBusinessLine(), intent);
            
            // 流式生成
            chatClient.prompt()
                    .system(buildSystemPrompt())
                    .user(buildUserPrompt(message, knowledge, session))
                    .stream()
                    .content()
                    .doOnNext(chunk -> {
                        try {
                            emitter.send(SseEmitter.event()
                                    .data(chunk)
                                    .name("message"));
                        } catch (IOException e) {
                            emitter.completeWithError(e);
                        }
                    })
                    .doOnComplete(() -> {
                        emitter.send(SseEmitter.event().name("done").data(""));
                        emitter.complete();
                    })
                    .subscribe();
                    
        } catch (Exception e) {
            emitter.completeWithError(e);
        }
    });
    
    return emitter;
}

经过这些优化，首字响应时间降到了1.2秒左右，用户反馈好多了。

效果数据

上线半年，对比原来的规则客服系统：

意图识别准确率：从61%提升到91%
问题首次解决率（无需转人工）：从43%提升到68%
用户满意度评分：从3.2分提升到4.1分（5分制）
人工客服处理量下降42%，但人工处理的每个case质量更高了（因为AI已经过滤了简单问题）

最大的意外收获：AI系统的对话记录提供了大量用户洞察，产品团队通过分析高频问题，发现了好几个产品流程的卡点并做了优化。这是用规则客服系统完全做不到的。