第2491篇：构建企业私有AI助手——基于内部知识的智能问答系统落地

老张2026/4/30大约 7 分钟

第2491篇：构建企业私有AI助手——基于内部知识的智能问答系统落地

适读人群：Java工程师、AI工程师、企业架构师 | 阅读时长：约15分钟 | 核心价值：掌握构建企业私有AI助手的完整技术方案，从知识接入到问答服务的全链路实现

我们接到过一个项目，是一家制造业公司。他们想做一个内部 AI 问答系统，让员工能用自然语言查询公司的规章制度、操作手册、产品文档。

公司有 8000 多个文档，涉及生产流程、质量标准、设备操作手册、人事制度……各种格式，PDF、Word、Excel 都有。

他们之前也试过用 ChatGPT，但两个问题：一是不能用公司的内部文件，二是安全合规不允许把内部信息发给外部 API。

所以需求很清晰：私有化部署 + 基于内部文档的 RAG 问答。

这是目前企业 AI 落地最常见的场景之一，我把这个项目的完整方案整理出来。

一、整体架构

私有化部署的关键选型：

Embedding 模型：BGE-M3（中英文双语，效果好，可本地运行）
向量数据库：Milvus（稳定，支持大规模）
LLM：Qwen2.5-72B 或 GLM-4（中文效果好，支持私有化）
Reranker：BGE-Reranker（提升检索精度）

二、文档处理流水线

@Service
@Slf4j
public class DocumentIngestionPipeline {

    private final TextExtractorFactory extractorFactory;
    private final DocumentChunker chunker;
    private final EmbeddingService embeddingService;
    private final MilvusClient milvusClient;
    private final DocumentMetadataRepository metadataRepo;

    // 处理单个文档
    public DocumentIngestionResult ingest(DocumentSource source) {
        log.info("开始处理文档: {}", source.getFileName());
        
        try {
            // 1. 文本提取（根据文件格式选择不同提取器）
            TextExtractor extractor = extractorFactory.getExtractor(source.getFileType());
            ExtractedDocument extracted = extractor.extract(source.getFilePath());

            // 2. 文本清洗
            String cleanedText = cleanText(extracted.getText());
            
            // 3. 元数据提取
            DocumentMetadata metadata = extractMetadata(source, extracted);
            
            // 4. 智能分块
            List<DocumentChunk> chunks = chunker.chunk(cleanedText, metadata);
            
            // 5. 批量生成 Embedding
            List<float[]> embeddings = embeddingService.batchEmbed(
                chunks.stream().map(DocumentChunk::getText).collect(Collectors.toList())
            );
            
            // 6. 写入向量库
            insertToMilvus(chunks, embeddings, metadata);
            
            // 7. 保存元数据
            metadata.setChunkCount(chunks.size());
            metadataRepo.save(metadata);
            
            log.info("文档处理完成: {}, 共 {} 个块", source.getFileName(), chunks.size());
            
            return DocumentIngestionResult.success(source.getFileName(), chunks.size());
            
        } catch (Exception e) {
            log.error("文档处理失败: {}", source.getFileName(), e);
            return DocumentIngestionResult.failed(source.getFileName(), e.getMessage());
        }
    }

    // 文本清洗：去除噪声、格式化
    private String cleanText(String rawText) {
        return rawText
            // 去除多余空白
            .replaceAll("\\s{3,}", "\n\n")
            // 去除页眉页脚中的页码（常见格式）
            .replaceAll("(?m)^第\\d+页$", "")
            .replaceAll("(?m)^\\d+\\s*$", "")
            // 修复常见 OCR 错误（可根据业务定制）
            .replace("I末", "I未")
            .trim();
    }

    // 元数据提取
    private DocumentMetadata extractMetadata(DocumentSource source, ExtractedDocument doc) {
        return DocumentMetadata.builder()
            .fileName(source.getFileName())
            .fileType(source.getFileType())
            .department(inferDepartment(source.getFilePath()))
            .documentType(inferDocumentType(source.getFileName(), doc.getText()))
            .createdDate(source.getCreatedDate())
            .lastModified(source.getLastModified())
            .accessLevel(source.getAccessLevel())
            .build();
    }

    private void insertToMilvus(
            List<DocumentChunk> chunks, 
            List<float[]> embeddings,
            DocumentMetadata metadata) {
        
        List<String> ids = new ArrayList<>();
        List<String> docIds = new ArrayList<>();
        List<String> texts = new ArrayList<>();
        List<String> departments = new ArrayList<>();
        List<String> docTypes = new ArrayList<>();
        List<List<Float>> embeddingsList = new ArrayList<>();

        for (int i = 0; i < chunks.size(); i++) {
            DocumentChunk chunk = chunks.get(i);
            ids.add(UUID.randomUUID().toString());
            docIds.add(metadata.getDocumentId());
            texts.add(chunk.getText());
            departments.add(metadata.getDepartment());
            docTypes.add(metadata.getDocumentType());
            
            float[] emb = embeddings.get(i);
            List<Float> embList = new ArrayList<>();
            for (float v : emb) embList.add(v);
            embeddingsList.add(embList);
        }

        InsertParam insertParam = InsertParam.newBuilder()
            .withCollectionName("enterprise_knowledge")
            .withFields(Arrays.asList(
                new InsertParam.Field("chunk_id", ids),
                new InsertParam.Field("doc_id", docIds),
                new InsertParam.Field("text", texts),
                new InsertParam.Field("department", departments),
                new InsertParam.Field("doc_type", docTypes),
                new InsertParam.Field("embedding", embeddingsList)
            ))
            .build();

        R<MutationResult> result = milvusClient.insert(insertParam);
        if (result.getStatus() != R.Status.Success.getCode()) {
            throw new RuntimeException("Milvus 写入失败: " + result.getMessage());
        }
    }
}

2.1 智能分块策略

@Component
public class DocumentChunker {

    private static final int TARGET_CHUNK_SIZE = 400;   // 目标块大小（字符）
    private static final int MAX_CHUNK_SIZE = 600;       // 最大块大小
    private static final int OVERLAP_SIZE = 50;          // 块间重叠

    public List<DocumentChunk> chunk(String text, DocumentMetadata metadata) {
        // 优先按语义边界分割（标题、段落）
        List<String> sections = splitBySemantic(text);
        
        List<DocumentChunk> chunks = new ArrayList<>();
        
        for (String section : sections) {
            if (section.length() <= MAX_CHUNK_SIZE) {
                // 章节本身就够小，直接作为一个块
                chunks.add(new DocumentChunk(section, metadata.getDocumentId()));
            } else {
                // 章节太长，按句子边界继续切割
                List<String> subChunks = splitBySize(section);
                subChunks.stream()
                    .map(c -> new DocumentChunk(c, metadata.getDocumentId()))
                    .forEach(chunks::add);
            }
        }
        
        // 添加块间上下文（前一块的最后几十字作为当前块的前缀）
        return addOverlap(chunks);
    }

    private List<String> splitBySemantic(String text) {
        // 按标题和段落边界分割
        return Arrays.stream(text.split("(?=\\n#{1,4}\\s)|(?<=\\n)(?=\\n)"))
            .filter(s -> !s.isBlank())
            .collect(Collectors.toList());
    }

    private List<String> splitBySize(String section) {
        List<String> chunks = new ArrayList<>();
        String[] sentences = section.split("(?<=[。！？.!?])");
        
        StringBuilder current = new StringBuilder();
        for (String sentence : sentences) {
            if (current.length() + sentence.length() > TARGET_CHUNK_SIZE 
                    && current.length() > 0) {
                chunks.add(current.toString());
                current = new StringBuilder();
            }
            current.append(sentence);
        }
        if (!current.isEmpty()) chunks.add(current.toString());
        
        return chunks;
    }

    private List<DocumentChunk> addOverlap(List<DocumentChunk> chunks) {
        List<DocumentChunk> result = new ArrayList<>();
        
        for (int i = 0; i < chunks.size(); i++) {
            String text = chunks.get(i).getText();
            if (i > 0) {
                String prevText = chunks.get(i - 1).getText();
                String overlap = prevText.substring(
                    Math.max(0, prevText.length() - OVERLAP_SIZE));
                text = overlap + text;
            }
            result.add(new DocumentChunk(text, chunks.get(i).getDocumentId()));
        }
        
        return result;
    }
}

三、问答服务实现

@Service
@Slf4j
public class EnterpriseQAService {

    private final HybridSearchService searchService;
    private final RerankerService rerankerService;
    private final PrivateLLMClient llmClient;
    private final QueryUnderstandingService queryUnderstanding;
    private final QAHistoryRepository historyRepo;

    public QAResponse answer(QARequest request) {
        log.info("处理问题: {}", request.getQuestion());
        
        // 1. 问题理解和改写
        QueryAnalysis analysis = queryUnderstanding.analyze(request.getQuestion());
        
        // 2. 混合检索
        List<RetrievedChunk> candidates = searchService.search(
            analysis.getOptimizedQuery(),
            buildSearchFilter(request.getUserContext(), analysis)
        );
        
        // 3. 重排序（提升相关性）
        List<RetrievedChunk> reranked = rerankerService.rerank(
            analysis.getOptimizedQuery(), candidates, 5);
        
        // 4. 判断是否有足够的信息回答
        if (reranked.isEmpty() || reranked.get(0).getScore() < 0.3) {
            return QAResponse.noAnswer(
                "抱歉，我在知识库中没有找到与您的问题相关的信息。" +
                "您可以尝试换一种方式描述问题，或联系相关部门获取帮助。"
            );
        }

        // 5. 生成答案
        String answer = generateAnswer(request.getQuestion(), reranked, analysis);

        // 6. 提取引用
        List<DocumentReference> references = extractReferences(reranked);

        // 7. 记录问答历史（用于改进和审计）
        historyRepo.save(QAHistory.builder()
            .userId(request.getUserId())
            .question(request.getQuestion())
            .answer(answer)
            .references(references)
            .createdAt(Instant.now())
            .build());

        return QAResponse.builder()
            .answer(answer)
            .references(references)
            .confidence(calculateConfidence(reranked))
            .build();
    }

    private String generateAnswer(
            String question, 
            List<RetrievedChunk> context,
            QueryAnalysis analysis) {
        
        // 构建上下文文本
        String contextText = context.stream()
            .map(c -> String.format("【来源：%s】\n%s", c.getSourceFileName(), c.getText()))
            .collect(Collectors.joining("\n\n---\n\n"));

        String systemPrompt = """
            你是一个企业内部知识问答助手。请根据提供的参考资料回答问题。
            
            规则：
            1. 只根据参考资料中的内容回答，不要凭空推断
            2. 如果参考资料不足以完整回答问题，明确说明哪部分无法回答
            3. 回答时引用来源文件名
            4. 使用清晰的格式，必要时使用列表
            5. 如果问题涉及操作步骤，请按步骤编号列出
            """;

        String userPrompt = String.format("""
            参考资料：
            %s
            
            问题：%s
            """, contextText, question);

        return llmClient.chat(systemPrompt, userPrompt);
    }

    private double calculateConfidence(List<RetrievedChunk> chunks) {
        if (chunks.isEmpty()) return 0;
        return chunks.stream()
            .mapToDouble(RetrievedChunk::getScore)
            .average()
            .orElse(0);
    }

    private List<DocumentReference> extractReferences(List<RetrievedChunk> chunks) {
        return chunks.stream()
            .map(chunk -> DocumentReference.builder()
                .documentId(chunk.getDocumentId())
                .fileName(chunk.getSourceFileName())
                .relevanceScore(chunk.getScore())
                .snippet(chunk.getText().substring(0, Math.min(100, chunk.getText().length())))
                .build())
            .distinct()
            .collect(Collectors.toList());
    }
}

四、问题理解与查询改写

@Service
public class QueryUnderstandingService {

    private final ChatClient chatClient;

    public QueryAnalysis analyze(String question) {
        String prompt = String.format("""
            分析以下用户问题，返回JSON格式：
            
            问题：%s
            
            {
                "intent": "问题类型（查找信息/操作指导/政策查询/其他）",
                "optimized_query": "改写后的查询（更适合检索，提取关键词）",
                "entities": ["关键实体列表"],
                "suggested_filters": {
                    "department": "相关部门（如可推断）",
                    "doc_type": "文档类型（如可推断）"
                }
            }
            
            只返回JSON。
            """, question);

        try {
            String response = chatClient.call(prompt);
            return objectMapper.readValue(response, QueryAnalysis.class);
        } catch (Exception e) {
            // 解析失败时返回原始问题
            return QueryAnalysis.simple(question);
        }
    }
}

五、部署和运维注意事项

企业私有化部署，有几个关键问题需要特别处理：

文档权限控制：不同员工看到的文档范围不同。HR 文档只有 HR 和管理层能问，技术文档根据部门隔离。这个必须在检索层实现，不能只在展示层做。

增量更新：文档会持续更新，需要支持增量索引（新增、修改、删除文档时实时或定时更新向量库），而不是每次全量重建。

回答质量监控：用户对答案的评价（点赞/点踩）要记录，定期分析差评原因，持续改进。

响应速度：私有化的 70B 模型在 A100 上推理还是需要几秒，可以用流式输出让用户体感更好。

这个项目上线后，他们员工查规章制度的效率提升非常明显，以前要翻很久的问题，现在几秒钟就有答案。最重要的是，用户信任这个系统，因为它给出的每个答案都有来源文档可以追溯。