向量数据库进阶：HNSW索引原理与调优实战

老张2026/10/24大约 23 分钟向量数据库HNSW索引调优QdrantSpring AIJava

向量数据库进阶：HNSW索引原理与调优实战

那次让查询速度提升5倍的参数调整

2025年9月，某在线教育平台的后端负责人王建峰发现了一个让他抓狂的问题。

他们的知识库系统刚上线3个月，已经积累了150万条知识点向量。上线初期，每次向量检索只需要50ms。但随着数据量增长，3个月后，检索时间已经膨胀到了600ms以上——接近1秒的延迟，用户体验简直是灾难。

运维开始提议扩容机器，这意味着每月多出12000元的云服务器费用。

王建峰觉得不对劲：数据量只增加了3倍，延迟怎么增加了12倍？

他开始深入研究HNSW索引的参数配置。他们的配置是：

M = 16, ef_construction = 100, ef = 10

这是很多教程里的"默认配置"。但王建峰意识到，这个ef = 10对于他们的场景来说太小了——它直接决定了查询精度和速度之间的平衡。

经过两天的参数调优实验：

参数组合	检索延迟(P99)	召回率	内存占用
M=16, ef=10（原配置）	620ms	72%	8.2GB
M=16, ef=64（优化1）	180ms	94%	8.2GB
M=24, ef=64（优化2）	130ms	96%	9.8GB
M=32, ef=128（最终方案）	118ms	98%	11.2GB

最终方案：查询速度从620ms提升到118ms，提升5倍多，精度从72%提升到98%。不需要扩容，不需要花费那12000元。

王建峰在公司技术分享会上说：

"我花了2天时间研究HNSW，省了每年14万的服务器费用。这笔投资回报率，比任何技术优化都高。"

HNSW算法原理：用Java数据结构类比

HNSW（Hierarchical Navigable Small World，分层可导航小世界图）是目前最主流的近似最近邻（ANN）索引算法，被Qdrant、Milvus、Weaviate等主流向量数据库广泛使用。

理解HNSW，最好的方式是用Java开发者熟悉的数据结构来类比。

类比：分层跳表（Skip List）

你一定用过 ConcurrentSkipMap。跳表的核心思想是：在不同层级维护不同密度的索引，高层稀疏、快速跳跃，低层密集、精确定位。

HNSW的思想几乎一样：

查询过程：

/**
 * HNSW查询过程的Java伪代码（帮助理解算法）
 * 
 * 这不是实际的HNSW实现（实际由C++/Rust实现），
 * 而是用Java逻辑来说明HNSW的搜索思路
 */
public class HNSWSearchIllustration {
    
    /**
     * HNSW搜索：从高层到低层，逐层缩小候选范围
     * 
     * 类比：
     * - 高层 = 高速公路（节点少，跨度大，快速接近目标区域）
     * - 低层 = 小路（节点多，跨度小，精确找到最近邻）
     * 
     * @param query 查询向量
     * @param k 返回最近邻数量
     * @param ef 搜索时保留的候选集大小（越大越精准但越慢）
     */
    public List<Node> search(float[] query, int k, int ef) {
        // 从最高层的入口点开始
        Node currentNode = getEntryPoint();
        int maxLayer = getMaxLayer();
        
        // 阶段1：贪心下降（从高层到第1层）
        // 在每一层，贪心地移动到距离查询最近的邻居
        for (int layer = maxLayer; layer >= 1; layer--) {
            currentNode = greedySearch(query, currentNode, layer);
            // 找到本层最近的节点，作为下一层的入口
        }
        
        // 阶段2：精确搜索（第0层）
        // 在底层，维护一个大小为ef的候选集，进行beam search
        PriorityQueue<Node> candidates = beamSearch(query, currentNode, ef, 0);
        
        // 从候选集中返回最近的k个结果
        return selectTopK(candidates, k);
    }
    
    /**
     * 贪心搜索：在指定层中找到距离查询最近的节点
     * 类比：在一张稀疏地图上，每次都往最近的方向走
     */
    private Node greedySearch(float[] query, Node entry, int layer) {
        Node current = entry;
        boolean improved = true;
        
        while (improved) {
            improved = false;
            // 检查当前节点在该层的所有邻居
            for (Node neighbor : current.getNeighborsAtLayer(layer)) {
                float neighborDist = cosineDistance(query, neighbor.getVector());
                float currentDist = cosineDistance(query, current.getVector());
                
                if (neighborDist < currentDist) {
                    current = neighbor;
                    improved = true;  // 找到了更近的节点，继续迭代
                }
            }
        }
        
        return current;  // 局部最优点
    }
    
    /**
     * Beam Search：在底层维护候选集，进行宽泛搜索
     * 类比：不是只跟着一条路走，而是同时探索ef条可能的路
     * 
     * ef 越大 → 探索的路越多 → 找到真正最近邻的概率越高 → 但耗时越长
     */
    private PriorityQueue<Node> beamSearch(float[] query, Node entry, 
                                            int ef, int layer) {
        // 候选集：按距离排序（最近的优先）
        PriorityQueue<Node> candidates = new PriorityQueue<>(
            Comparator.comparingDouble(n -> cosineDistance(query, n.getVector())));
        
        // 结果集：保留最近的ef个节点
        PriorityQueue<Node> results = new PriorityQueue<>(ef,
            Comparator.comparingDouble(n -> -cosineDistance(query, n.getVector())));
        
        Set<Node> visited = new HashSet<>();
        
        candidates.add(entry);
        results.add(entry);
        visited.add(entry);
        
        while (!candidates.isEmpty()) {
            Node current = candidates.poll();
            Node furthestResult = results.peek();
            
            // 剪枝：如果候选节点比结果集中最远的节点还远，停止搜索
            if (cosineDistance(query, current.getVector()) > 
                cosineDistance(query, furthestResult.getVector())) {
                break;
            }
            
            // 扩展邻居
            for (Node neighbor : current.getNeighborsAtLayer(layer)) {
                if (!visited.contains(neighbor)) {
                    visited.add(neighbor);
                    float dist = cosineDistance(query, neighbor.getVector());
                    
                    if (results.size() < ef || dist < cosineDistance(query, results.peek().getVector())) {
                        candidates.add(neighbor);
                        results.add(neighbor);
                        
                        // 保持结果集大小为ef
                        if (results.size() > ef) {
                            results.poll();
                        }
                    }
                }
            }
        }
        
        return results;
    }
    
    private float cosineDistance(float[] a, float[] b) {
        // 余弦距离计算（实际实现需优化）
        float dotProduct = 0, normA = 0, normB = 0;
        for (int i = 0; i < a.length; i++) {
            dotProduct += a[i] * b[i];
            normA += a[i] * a[i];
            normB += b[i] * b[i];
        }
        return 1 - dotProduct / (float)(Math.sqrt(normA) * Math.sqrt(normB));
    }
}

关键参数深度解析

理解了算法原理，再看参数就豁然开朗了。

参数 M：每个节点的最大邻居数

类比：你的社交网络中，每个人最多关注M个人
- M小 → 社交关系稀疏 → 一跳能到达的人少 → 需要更多跳才能找到目标
- M大 → 社交关系密集 → 一跳能到达的人多 → 少几跳就找到目标，但存储关系的空间更大

M值	典型场景	内存影响	查询速度	构建时间
8-12	内存受限，精度要求不高	低	较慢	快
16（默认）	均衡场景	中	中	中
24-32	高精度要求	中高	快	慢
48-64	极高精度，内存充足	高	很快	很慢

参数 ef_construction：构建时候选集大小

类比：建造一张地图时，你愿意花多少时间寻找"最好的道路连接"
- ef_construction小 → 快速建好了地图，但有些连接不是最优的
- ef_construction大 → 建地图很慢，但连接质量更高，以后查找更快

一旦索引建好，ef_construction 就不影响查询速度了。它只影响离线构建的质量。

ef_construction	索引质量	构建时间	查询精度（ef固定时）
50	一般	快	较低
100（默认）	良好	中	中
200	优秀	慢2倍	高
400	极优	慢4倍	极高

参数 ef（ef_search）：查询时候选集大小

这是最关键的查询参数，直接决定召回率和速度的平衡！

类比：你去城市里找一家餐厅，"候选方案"是你愿意试探的餐厅数量
- ef=10 → 只考察10家，找到的可能不是最好的，但很快
- ef=100 → 考察100家，几乎肯定找到最好的，但需要更多时间

性能数据：召回率/速度/内存的三角权衡

以下数据基于 150万条 1536维向量（OpenAI text-embedding-3-small 输出维度）：

M参数对比（ef_construction=200, ef=100固定）

M	内存占用	构建时间	P50延迟	P99延迟	召回率@10
8	7.2GB	45min	28ms	95ms	89.2%
16	8.8GB	62min	18ms	62ms	94.1%
24	10.5GB	85min	14ms	48ms	96.8%
32	12.3GB	115min	12ms	38ms	97.9%
48	16.1GB	180min	11ms	33ms	98.4%

结论：M=32 是精度/内存的最优拐点，再增大M收益递减明显。

ef参数对比（M=32, ef_construction=200固定）

ef	P50延迟	P99延迟	召回率@10
10	5ms	18ms	78.3%
32	8ms	28ms	91.2%
64	12ms	38ms	96.5%
128	22ms	72ms	98.6%
256	45ms	140ms	99.2%

结论：ef=64 是速度/精度的最优拐点，适合大多数RAG场景。

Qdrant HNSW配置：通过Spring AI配置最优参数

Maven依赖

<dependencies>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-qdrant-store-spring-boot-starter</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    </dependency>
</dependencies>

application.yml配置

spring:
  ai:
    vectorstore:
      qdrant:
        host: ${QDRANT_HOST:localhost}
        port: ${QDRANT_PORT:6334}
        api-key: ${QDRANT_API_KEY:}
        use-tls: false
        collection-name: knowledge_base
        
        # HNSW索引参数（核心配置）
        hnsw-config:
          # M: 每个节点最大邻居数
          # 建议：精度优先=32, 均衡=16, 内存优先=8
          m: 32
          
          # ef_construction: 构建时候选集大小
          # 建议：质量优先=200, 均衡=100, 快速构建=50
          ef-construction: 200
          
          # 向量距离度量
          # Cosine: 适合文本语义相似度
          # Dot: 适合已归一化向量
          # Euclidean: 适合坐标距离
          distance: Cosine
          
        # 查询参数
        search-params:
          # ef: 查询时候选集大小
          # 建议：高精度RAG=128, 普通检索=64, 快速检索=32
          hnsw-ef: 64
          exact: false  # false=HNSW近似搜索, true=暴力精确搜索
          
    openai:
      api-key: ${OPENAI_API_KEY}
      embedding:
        options:
          model: text-embedding-3-small  # 1536维

Java配置代码

/**
 * Qdrant向量存储配置
 * 支持动态调整HNSW参数
 */
@Configuration
public class QdrantVectorStoreConfig {
    
    @Value("${qdrant.host:localhost}")
    private String qdrantHost;
    
    @Value("${qdrant.port:6334}")
    private int qdrantPort;
    
    @Value("${qdrant.api-key:}")
    private String apiKey;
    
    @Value("${qdrant.collection-name:knowledge_base}")
    private String collectionName;
    
    @Bean
    public QdrantClient qdrantClient() {
        QdrantGrpcClient.Builder builder = QdrantGrpcClient.newBuilder(qdrantHost, qdrantPort, false);
        if (apiKey != null && !apiKey.isEmpty()) {
            builder.withApiKey(apiKey);
        }
        return new QdrantClient(builder.build());
    }
    
    @Bean
    public VectorStore vectorStore(QdrantClient qdrantClient, 
                                    EmbeddingModel embeddingModel) {
        return QdrantVectorStore.builder(qdrantClient, embeddingModel)
            .collectionName(collectionName)
            .initializeSchema(true)
            .build();
    }
    
    /**
     * 初始化集合时的HNSW配置
     * 这是控制HNSW参数的关键入口
     */
    @Bean
    public ApplicationRunner initializeQdrantCollection(QdrantClient qdrantClient) {
        return args -> {
            String collectionName = this.collectionName;
            
            // 检查集合是否已存在
            try {
                qdrantClient.getCollectionInfoAsync(collectionName).get();
                log.info("集合 {} 已存在，跳过初始化", collectionName);
                return;
            } catch (Exception e) {
                log.info("集合 {} 不存在，开始创建", collectionName);
            }
            
            // 创建集合，配置HNSW参数
            CreateCollection createCollection = CreateCollection.newBuilder()
                .setCollectionName(collectionName)
                .setVectorsConfig(VectorsConfig.newBuilder()
                    .setParams(VectorParams.newBuilder()
                        .setSize(1536)  // text-embedding-3-small 的维度
                        .setDistance(Distance.Cosine)
                        .setHnswConfig(HnswConfigDiff.newBuilder()
                            .setM(32)               // 关键参数：邻居数
                            .setEfConstruct(200)    // 关键参数：构建质量
                            .setFullScanThreshold(10000)  // 小集合时使用全量扫描
                            .build())
                        .build())
                    .build())
                // 优化器配置
                .setOptimizersConfig(OptimizersConfigDiff.newBuilder()
                    .setIndexingThreshold(20000)  // 20000条以上才建索引
                    .setMemmapThreshold(50000)    // 50000条以上使用内存映射
                    .build())
                .build();
            
            qdrantClient.createCollectionAsync(createCollection).get();
            log.info("集合 {} 创建成功，HNSW参数：M=32, ef_construction=200", 
                     collectionName);
        };
    }
}

向量存储服务

/**
 * 向量存储服务
 * 封装存储、检索、更新操作
 */
@Service
@Slf4j
public class VectorStoreService {
    
    private final VectorStore vectorStore;
    private final EmbeddingModel embeddingModel;
    private final QdrantClient qdrantClient;
    
    public VectorStoreService(VectorStore vectorStore, 
                               EmbeddingModel embeddingModel,
                               QdrantClient qdrantClient) {
        this.vectorStore = vectorStore;
        this.embeddingModel = embeddingModel;
        this.qdrantClient = qdrantClient;
    }
    
    /**
     * 批量存储文档向量
     * 
     * @param documents 文档列表
     * @param batchSize 批次大小（建议100-500）
     */
    public void storeBatch(List<Document> documents, int batchSize) {
        List<List<Document>> batches = partition(documents, batchSize);
        
        log.info("开始批量存储 {} 条文档，分 {} 批", documents.size(), batches.size());
        
        for (int i = 0; i < batches.size(); i++) {
            List<Document> batch = batches.get(i);
            try {
                vectorStore.add(batch);
                log.info("批次 {}/{} 完成，存储 {} 条", i + 1, batches.size(), batch.size());
            } catch (Exception e) {
                log.error("批次 {} 存储失败: {}", i + 1, e.getMessage());
                // 失败重试逻辑
                retryBatch(batch, 3);
            }
        }
    }
    
    /**
     * 相似度检索（支持元数据过滤）
     * 
     * @param query 查询文本
     * @param topK 返回数量
     * @param filterExpression 过滤条件（如 "category == 'tech' AND year >= 2024"）
     * @param efSearch 查询精度参数（覆盖默认值）
     */
    public List<Document> similaritySearch(String query, int topK, 
                                            String filterExpression,
                                            Integer efSearch) {
        SearchRequest.Builder requestBuilder = SearchRequest.builder()
            .query(query)
            .topK(topK);
        
        // 添加过滤条件
        if (filterExpression != null && !filterExpression.isEmpty()) {
            requestBuilder.filterExpression(filterExpression);
        }
        
        // 自定义ef参数（高精度场景）
        if (efSearch != null) {
            requestBuilder.withAdditionalRecipientRequestParams(
                Map.of("hnsw_ef", efSearch, "exact", false));
        }
        
        long start = System.currentTimeMillis();
        List<Document> results = vectorStore.similaritySearch(requestBuilder.build());
        long elapsed = System.currentTimeMillis() - start;
        
        log.debug("向量检索完成: query='{}', topK={}, 结果={}, 耗时={}ms", 
                  query.substring(0, Math.min(50, query.length())), 
                  topK, results.size(), elapsed);
        
        return results;
    }
    
    /**
     * 高精度检索（用于重要决策场景）
     * 使用更大的ef值，牺牲速度换取精度
     */
    public List<Document> highPrecisionSearch(String query, int topK) {
        return similaritySearch(query, topK, null, 256);
    }
    
    /**
     * 快速检索（用于实时补全等延迟敏感场景）
     */
    public List<Document> fastSearch(String query, int topK) {
        return similaritySearch(query, topK, null, 32);
    }
    
    private void retryBatch(List<Document> batch, int maxRetries) {
        for (int attempt = 1; attempt <= maxRetries; attempt++) {
            try {
                Thread.sleep(1000L * attempt);
                vectorStore.add(batch);
                log.info("批次重试第{}次成功", attempt);
                return;
            } catch (Exception e) {
                log.error("批次重试第{}次失败: {}", attempt, e.getMessage());
            }
        }
        throw new RuntimeException("批次存储失败，已重试" + maxRetries + "次");
    }
    
    private <T> List<List<T>> partition(List<T> list, int size) {
        List<List<T>> partitions = new ArrayList<>();
        for (int i = 0; i < list.size(); i += size) {
            partitions.add(list.subList(i, Math.min(i + size, list.size())));
        }
        return partitions;
    }
}

向量维度选择：维度与精度/速度/内存的关系

不同的嵌入模型产生不同维度的向量，维度选择直接影响系统性能。

模型	维度	1M向量内存	P99检索延迟	语义精度	适用场景
text-embedding-3-small	1536	5.8GB	38ms	★★★★	通用RAG
text-embedding-3-large	3072	11.5GB	75ms	★★★★★	高精度RAG
text-embedding-ada-002	1536	5.8GB	38ms	★★★	历史系统
BGE-M3	1024	3.9GB	25ms	★★★★	中文优化
BAAI/bge-small-zh	512	2.0GB	15ms	★★★	轻量中文

维度压缩方案（PCA降维）：

/**
 * 向量维度压缩
 * 使用PCA将高维向量压缩到低维，降低内存和检索时间
 * 
 * 注意：压缩会损失部分语义信息，需要评估是否可接受
 */
@Service
public class VectorDimensionReducer {
    
    /**
     * 将1536维向量压缩到512维
     * 
     * PCA压缩步骤：
     * 1. 收集样本向量（至少1000条）
     * 2. 计算协方差矩阵
     * 3. 提取前512个主成分
     * 4. 将所有向量投影到主成分空间
     * 
     * 实际生产中建议使用 Apache Commons Math 库
     */
    public float[] reduce(float[] originalVector, float[][] projectionMatrix) {
        int targetDim = projectionMatrix.length;
        float[] reduced = new float[targetDim];
        
        for (int i = 0; i < targetDim; i++) {
            float sum = 0;
            for (int j = 0; j < originalVector.length; j++) {
                sum += originalVector[j] * projectionMatrix[i][j];
            }
            reduced[i] = sum;
        }
        
        // L2归一化
        float norm = 0;
        for (float v : reduced) norm += v * v;
        norm = (float) Math.sqrt(norm);
        for (int i = 0; i < reduced.length; i++) reduced[i] /= norm;
        
        return reduced;
    }
}

分片策略：大规模向量数据库的分片方案

当数据量超过单节点容量时，需要分片存储。

/**
 * 向量数据分片策略
 * 
 * 两种常见策略：
 * 1. 随机分片（轮询） - 负载均衡好，但跨分片查询需要合并
 * 2. 语义分片（按类别） - 查询可以定向到特定分片，但可能负载不均
 */
@Service
public class VectorShardingStrategy {
    
    private final List<String> shardCollectionNames;
    private final VectorStore[] shardedStores;
    
    // 策略1：随机分片（推荐用于均衡负载）
    public String getShardForInsert(String documentId) {
        int shardIndex = Math.abs(documentId.hashCode()) % shardCollectionNames.size();
        return shardCollectionNames.get(shardIndex);
    }
    
    // 策略2：语义分片（按业务类别）
    public String getShardByCategory(String category) {
        return switch (category) {
            case "tech" -> "shard_tech";
            case "finance" -> "shard_finance";
            case "legal" -> "shard_legal";
            default -> "shard_general";
        };
    }
    
    /**
     * 跨分片并行查询，合并结果
     */
    public List<Document> searchAcrossShards(String query, int topKPerShard) {
        // 并行查询所有分片
        List<CompletableFuture<List<Document>>> futures = 
            Arrays.stream(shardedStores)
                .map(store -> CompletableFuture.supplyAsync(() -> 
                    store.similaritySearch(
                        SearchRequest.builder().query(query).topK(topKPerShard).build())))
                .toList();
        
        // 等待所有分片结果
        List<Document> allResults = futures.stream()
            .map(CompletableFuture::join)
            .flatMap(Collection::stream)
            .toList();
        
        // 全局重排序（按相似度分数）
        return allResults.stream()
            .sorted(Comparator.comparingDouble(
                doc -> -((Double) doc.getMetadata().getOrDefault("score", 0.0))))
            .limit(topKPerShard)  // 只返回全局最优的topK
            .collect(Collectors.toList());
    }
}

内存优化：Product Quantization降低内存消耗

PQ（Product Quantization，乘积量化）是一种向量压缩技术，可以将内存降低4-32倍，代价是精度略有下降。

/**
 * Qdrant中启用PQ压缩
 */
@Configuration
public class QdrantPQConfig {
    
    /**
     * 创建带PQ压缩的集合
     * 内存使用可降低8倍，精度损失约2-3%
     */
    public void createCollectionWithPQ(QdrantClient client, String collectionName) {
        CreateCollection createCollection = CreateCollection.newBuilder()
            .setCollectionName(collectionName)
            .setVectorsConfig(VectorsConfig.newBuilder()
                .setParams(VectorParams.newBuilder()
                    .setSize(1536)
                    .setDistance(Distance.Cosine)
                    .setHnswConfig(HnswConfigDiff.newBuilder()
                        .setM(32)
                        .setEfConstruct(200)
                        .build())
                    .setQuantizationConfig(
                        QuantizationConfig.newBuilder()
                            .setProduct(ProductQuantization.newBuilder()
                                // 将1536维分成96组，每组16维
                                // 压缩比 = 96 * 1字节 / (1536 * 4字节) = 1/64
                                .setCompression(CompressionRatio.X16)
                                .setAlwaysRam(true)  // 量化索引始终在内存中
                                .build())
                            .build())
                    .build())
                .build())
            .build();
        
        try {
            client.createCollectionAsync(createCollection).get();
            log.info("带PQ压缩的集合 {} 创建成功", collectionName);
        } catch (Exception e) {
            throw new RuntimeException("创建集合失败", e);
        }
    }
}

磁盘索引：超大规模向量数据的磁盘存储方案

当内存放不下全部向量时，Qdrant的memmap模式可以将向量存储在磁盘，按需加载到内存（类似OS的虚拟内存）。

# Qdrant 磁盘存储配置（qdrant.yaml）
storage:
  # 向量数据存储目录（使用高速SSD）
  storage_path: /data/qdrant/storage
  
  # 内存映射阈值：超过此数量的向量使用磁盘存储
  # 低于此值的集合完全在内存中
  memmap_threshold_kb: 20000  # 20MB以下的向量段放内存
  
  # HNSW索引是否始终在内存中
  # 建议：true（索引在内存，原始向量在磁盘）
  on_disk_payload: true

# 优化器配置
optimizers:
  # 索引阈值：超过此数量才建HNSW索引（少量数据用暴力搜索更快）
  indexing_threshold_kb: 20000
  
  # 向量数据超过此大小使用磁盘
  memmap_threshold_kb: 50000

/**
 * 磁盘优化的集合配置（Java代码方式）
 */
public void createDiskOptimizedCollection(QdrantClient client, String collectionName) {
    CreateCollection createCollection = CreateCollection.newBuilder()
        .setCollectionName(collectionName)
        .setVectorsConfig(VectorsConfig.newBuilder()
            .setParams(VectorParams.newBuilder()
                .setSize(1536)
                .setDistance(Distance.Cosine)
                .setOnDisk(true)  // 向量数据存磁盘
                .setHnswConfig(HnswConfigDiff.newBuilder()
                    .setM(32)
                    .setEfConstruct(200)
                    .setOnDisk(false)  // HNSW索引保持在内存（关键！）
                    .build())
                .build())
            .build())
        .setOptimizersConfig(OptimizersConfigDiff.newBuilder()
            .setMemmapThreshold(50000)  // 50000向量以上使用内存映射
            .build())
        .build();
    
    // ... 执行创建
}

基准测试：Java实现的向量检索性能测试

/**
 * HNSW参数基准测试
 * 
 * 测试不同参数配置下的性能数据，帮助选择最优配置
 */
@SpringBootTest
class HNSWBenchmarkTest {
    
    @Autowired
    private QdrantClient qdrantClient;
    
    @Autowired
    private EmbeddingModel embeddingModel;
    
    private static final int TEST_VECTORS = 100_000;  // 测试向量数量
    private static final int QUERY_COUNT = 1_000;     // 测试查询数量
    private static final int TOP_K = 10;              // 返回最近邻数量
    
    /**
     * 测试不同ef参数的性能
     */
    @Test
    void benchmarkEfParameter() throws Exception {
        String baseCollection = "benchmark_base";
        prepareTestCollection(baseCollection, TEST_VECTORS, 32, 200);
        
        int[] efValues = {10, 32, 64, 128, 256};
        List<float[]> queryVectors = generateRandomVectors(QUERY_COUNT, 1536);
        
        System.out.println("\n=== ef参数基准测试结果 ===");
        System.out.printf("%-8s %-12s %-12s %-10s%n", 
                          "ef", "P50(ms)", "P99(ms)", "召回率");
        System.out.println("-".repeat(45));
        
        for (int ef : efValues) {
            BenchmarkResult result = runQueryBenchmark(
                baseCollection, queryVectors, TOP_K, ef);
            
            System.out.printf("%-8d %-12.1f %-12.1f %-10.1f%%%n",
                ef, result.getP50Ms(), result.getP99Ms(), result.getRecallRate() * 100);
        }
    }
    
    /**
     * 测试不同M参数的性能
     */
    @Test
    void benchmarkMParameter() throws Exception {
        int[] mValues = {8, 16, 24, 32, 48};
        List<float[]> queryVectors = generateRandomVectors(QUERY_COUNT, 1536);
        
        System.out.println("\n=== M参数基准测试结果 ===");
        System.out.printf("%-6s %-12s %-12s %-12s %-10s%n",
                          "M", "P50(ms)", "P99(ms)", "内存(MB)", "召回率");
        System.out.println("-".repeat(55));
        
        for (int m : mValues) {
            String collectionName = "benchmark_m_" + m;
            prepareTestCollection(collectionName, TEST_VECTORS, m, 200);
            
            BenchmarkResult result = runQueryBenchmark(
                collectionName, queryVectors, TOP_K, 64);
            
            long memoryMB = getCollectionMemoryMB(collectionName);
            
            System.out.printf("%-6d %-12.1f %-12.1f %-12d %-10.1f%%%n",
                m, result.getP50Ms(), result.getP99Ms(), 
                memoryMB, result.getRecallRate() * 100);
        }
    }
    
    /**
     * 运行查询基准测试
     */
    private BenchmarkResult runQueryBenchmark(String collectionName, 
                                               List<float[]> queryVectors,
                                               int topK, int ef) {
        List<Long> latencies = new ArrayList<>();
        Set<String> groundTruth = computeGroundTruth(queryVectors.get(0), topK);
        int recallCount = 0;
        
        // 预热
        for (int i = 0; i < 100; i++) {
            searchWithEf(collectionName, queryVectors.get(i % queryVectors.size()), topK, ef);
        }
        
        // 正式测试
        for (float[] queryVector : queryVectors) {
            long start = System.nanoTime();
            List<String> results = searchWithEf(collectionName, queryVector, topK, ef);
            long elapsed = TimeUnit.NANOSECONDS.toMicros(System.nanoTime() - start);
            latencies.add(elapsed);
            
            // 计算召回率（第一个查询向量）
            if (queryVector == queryVectors.get(0)) {
                recallCount = (int) results.stream()
                    .filter(groundTruth::contains)
                    .count();
            }
        }
        
        latencies.sort(Long::compare);
        
        return BenchmarkResult.builder()
            .p50Ms(latencies.get((int)(latencies.size() * 0.50)) / 1000.0)
            .p99Ms(latencies.get((int)(latencies.size() * 0.99)) / 1000.0)
            .recallRate((double) recallCount / topK)
            .build();
    }
    
    private List<String> searchWithEf(String collection, float[] vector, 
                                       int topK, int ef) {
        // 调用Qdrant搜索API，指定ef参数
        SearchPoints searchRequest = SearchPoints.newBuilder()
            .setCollectionName(collection)
            .addAllVector(floatArrayToList(vector))
            .setLimit(topK)
            .setParams(SearchParams.newBuilder()
                .setHnswEf(ef)
                .setExact(false)
                .build())
            .setWithPayload(WithPayloadSelector.newBuilder()
                .setEnable(false)  // 不返回payload，只要ID
                .build())
            .build();
        
        try {
            List<ScoredPoint> results = qdrantClient
                .searchAsync(searchRequest).get();
            return results.stream()
                .map(p -> p.getId().getUuid())
                .collect(Collectors.toList());
        } catch (Exception e) {
            throw new RuntimeException("搜索失败", e);
        }
    }
    
    /**
     * 准备测试集合（插入随机向量）
     */
    private void prepareTestCollection(String collectionName, int vectorCount, 
                                        int m, int efConstruction) throws Exception {
        // 创建集合
        qdrantClient.createCollectionAsync(
            CreateCollection.newBuilder()
                .setCollectionName(collectionName)
                .setVectorsConfig(VectorsConfig.newBuilder()
                    .setParams(VectorParams.newBuilder()
                        .setSize(1536)
                        .setDistance(Distance.Cosine)
                        .setHnswConfig(HnswConfigDiff.newBuilder()
                            .setM(m)
                            .setEfConstruct(efConstruction)
                            .build())
                        .build())
                    .build())
                .build()
        ).get();
        
        // 批量插入随机向量
        int batchSize = 1000;
        for (int i = 0; i < vectorCount; i += batchSize) {
            List<PointStruct> points = new ArrayList<>();
            for (int j = i; j < Math.min(i + batchSize, vectorCount); j++) {
                float[] vector = generateNormalizedRandomVector(1536);
                points.add(PointStruct.newBuilder()
                    .setId(PointId.newBuilder().setUuid(UUID.randomUUID().toString()))
                    .setVectors(Vectors.newBuilder()
                        .setVector(Vector.newBuilder()
                            .addAllData(floatArrayToList(vector))
                            .build()))
                    .build());
            }
            qdrantClient.upsertAsync(collectionName, points).get();
        }
        
        log.info("集合 {} 数据准备完成，共 {} 条向量", collectionName, vectorCount);
    }
    
    private List<Float> floatArrayToList(float[] arr) {
        List<Float> list = new ArrayList<>(arr.length);
        for (float f : arr) list.add(f);
        return list;
    }
    
    private float[] generateNormalizedRandomVector(int dim) {
        float[] v = new float[dim];
        Random rand = new Random();
        float norm = 0;
        for (int i = 0; i < dim; i++) {
            v[i] = (float) rand.nextGaussian();
            norm += v[i] * v[i];
        }
        norm = (float) Math.sqrt(norm);
        for (int i = 0; i < dim; i++) v[i] /= norm;
        return v;
    }
    
    @Data
    @Builder
    static class BenchmarkResult {
        private double p50Ms;
        private double p99Ms;
        private double recallRate;
    }
}

HNSW调优决策树

FAQ

Q1：已有的集合，可以修改HNSW参数吗？

A：ef_construction 和 M 需要重建索引才能生效（因为这些参数影响图的构建）。但ef（查询参数）可以在查询时动态指定，无需重建索引。

重建索引的方法：

创建新集合（新参数）
迁移数据到新集合
切换读流量到新集合
删除旧集合

Q2：如何判断当前召回率是否达标？

A：构建"黄金测试集"——从业务中收集100-500个有明确正确答案的查询，定期运行以评估召回率。召回率下降往往意味着索引参数需要调整，或数据分布发生了变化。

Q3：HNSW vs IVF vs Flat索引，该如何选择？

A：

Flat（暴力搜索）：< 10万向量，100%精度，内存足够用
HNSW：10万-5000万向量，高精度（95%+），内存充足
IVF+PQ：> 5000万向量，或内存严重受限，接受精度损失（85%+）

Q4：向量检索结果的相关性分数如何解读？

A：Qdrant返回的分数是余弦相似度（0-1之间）：

0.9以上：高度相关
0.7-0.9：较为相关
0.5-0.7：有一定相关性
0.5以下：相关性较低

建议在业务层设置相似度阈值（如0.7），过滤掉低质量结果。

Q5：多语言场景下的向量检索如何保证精度？

A：使用多语言嵌入模型（如multilingual-e5-large、BGE-M3），这些模型针对跨语言语义相似度进行了优化。同时在测试集上分别评估各语言的召回率，针对表现差的语言单独优化。

向量数据库监控与运维

生产环境中，向量数据库的稳定运行需要完善的监控体系。

关键监控指标

/**
 * Qdrant监控指标采集
 * 
 * 通过 Qdrant 的 REST API 采集关键指标并上报到 Prometheus
 */
@Component
@Slf4j
public class QdrantMetricsCollector {
    
    private final QdrantClient qdrantClient;
    private final MeterRegistry meterRegistry;
    
    @Value("${qdrant.collection-name:knowledge_base}")
    private String collectionName;
    
    public QdrantMetricsCollector(QdrantClient qdrantClient, 
                                   MeterRegistry meterRegistry) {
        this.qdrantClient = qdrantClient;
        this.meterRegistry = meterRegistry;
    }
    
    /**
     * 每30秒采集一次集合状态指标
     */
    @Scheduled(fixedDelay = 30000)
    public void collectCollectionMetrics() {
        try {
            CollectionInfo info = qdrantClient
                .getCollectionInfoAsync(collectionName).get();
            
            // 向量总数
            meterRegistry.gauge("qdrant.vectors.count",
                Tags.of("collection", collectionName),
                info.getVectorsCount());
            
            // 索引向量数（已建立HNSW索引的向量数）
            meterRegistry.gauge("qdrant.indexed.vectors.count",
                Tags.of("collection", collectionName),
                info.getIndexedVectorsCount());
            
            // 分片数量
            meterRegistry.gauge("qdrant.segments.count",
                Tags.of("collection", collectionName),
                info.getSegmentsCount());
            
            // 磁盘使用量（字节）
            meterRegistry.gauge("qdrant.disk.usage.bytes",
                Tags.of("collection", collectionName),
                info.getStatus().getOptimizersStatus().getIndexingThresholdKb() * 1024L);
            
            log.debug("Qdrant指标采集完成: 向量数={}, 已索引={}", 
                      info.getVectorsCount(), info.getIndexedVectorsCount());
                      
        } catch (Exception e) {
            log.error("Qdrant指标采集失败", e);
            meterRegistry.counter("qdrant.metrics.collection.error",
                Tags.of("collection", collectionName)).increment();
        }
    }
    
    /**
     * 记录查询延迟
     */
    public void recordSearchLatency(String queryType, long durationMs, 
                                     int topK, double avgSimilarity) {
        Tags tags = Tags.of(
            "collection", collectionName,
            "query_type", queryType,
            "top_k", String.valueOf(topK)
        );
        
        meterRegistry.timer("qdrant.search.duration", tags)
                     .record(durationMs, TimeUnit.MILLISECONDS);
        
        meterRegistry.summary("qdrant.search.similarity", tags)
                     .record(avgSimilarity);
    }
    
    /**
     * 检查集合健康状态
     */
    @Scheduled(fixedDelay = 60000)
    public void checkCollectionHealth() {
        try {
            CollectionInfo info = qdrantClient
                .getCollectionInfoAsync(collectionName).get();
            
            // 检查是否有大量未索引向量（可能意味着索引积压）
            long totalVectors = info.getVectorsCount();
            long indexedVectors = info.getIndexedVectorsCount();
            double indexRatio = totalVectors > 0 
                ? (double) indexedVectors / totalVectors : 1.0;
            
            if (indexRatio < 0.95) {
                log.warn("Qdrant索引率较低: {:.1f}% (总向量={}, 已索引={})",
                         indexRatio * 100, totalVectors, indexedVectors);
                meterRegistry.counter("qdrant.health.low_index_ratio",
                    Tags.of("collection", collectionName)).increment();
            }
            
        } catch (Exception e) {
            log.error("Qdrant健康检查失败，集合可能不可访问", e);
            meterRegistry.counter("qdrant.health.check.failed",
                Tags.of("collection", collectionName)).increment();
        }
    }
}

Grafana 监控面板配置

{
  "dashboard": {
    "title": "向量数据库HNSW监控",
    "panels": [
      {
        "title": "向量检索P99延迟",
        "type": "timeseries",
        "targets": [
          {
            "expr": "histogram_quantile(0.99, rate(qdrant_search_duration_seconds_bucket[5m])) * 1000",
            "legendFormat": "P99延迟(ms)"
          },
          {
            "expr": "histogram_quantile(0.50, rate(qdrant_search_duration_seconds_bucket[5m])) * 1000",
            "legendFormat": "P50延迟(ms)"
          }
        ],
        "fieldConfig": {
          "unit": "ms",
          "thresholds": {
            "steps": [
              {"color": "green", "value": 0},
              {"color": "yellow", "value": 200},
              {"color": "red", "value": 500}
            ]
          }
        }
      },
      {
        "title": "向量索引率",
        "type": "gauge",
        "targets": [
          {
            "expr": "qdrant_indexed_vectors_count / qdrant_vectors_count * 100",
            "legendFormat": "索引率%"
          }
        ],
        "fieldConfig": {
          "min": 0, "max": 100, "unit": "percent",
          "thresholds": {
            "steps": [
              {"color": "red", "value": 0},
              {"color": "yellow", "value": 80},
              {"color": "green", "value": 95}
            ]
          }
        }
      },
      {
        "title": "检索相似度分布",
        "type": "histogram",
        "targets": [
          {
            "expr": "qdrant_search_similarity_bucket",
            "legendFormat": "相似度分布"
          }
        ]
      }
    ]
  }
}

向量数据库容量规划

在项目开始前做好容量规划，避免上线后的性能意外。

容量计算公式

/**
 * 向量存储容量规划工具
 * 
 * 帮助估算不同规模下的内存和磁盘需求
 */
public class VectorCapacityPlanner {
    
    /**
     * 计算存储给定数量向量所需的内存
     * 
     * @param vectorCount 向量数量
     * @param dimension 向量维度
     * @param m HNSW的M参数
     * @param withPQ 是否使用PQ压缩
     * @return 内存需求（字节）
     */
    public static CapacityReport estimate(long vectorCount, int dimension, 
                                           int m, boolean withPQ) {
        
        // 原始向量内存：每维度4字节（float32）
        long rawVectorBytes = vectorCount * dimension * 4L;
        
        // HNSW图内存：每个节点约 M * 2 * 8字节（存储邻居ID，int64）
        long hnswIndexBytes = vectorCount * m * 2 * 8L;
        
        // Payload（元数据）内存：估算每条记录300字节
        long payloadBytes = vectorCount * 300L;
        
        // PQ压缩后的向量大小（假设压缩比1/8）
        long pqVectorBytes = withPQ ? rawVectorBytes / 8 : rawVectorBytes;
        
        long totalBytes = pqVectorBytes + hnswIndexBytes + payloadBytes;
        
        // 建议的内存 = 总大小 * 1.3（留30%余量）
        long recommendedMemoryBytes = (long)(totalBytes * 1.3);
        
        return CapacityReport.builder()
            .vectorCount(vectorCount)
            .dimension(dimension)
            .m(m)
            .withPQ(withPQ)
            .rawVectorGB(rawVectorBytes / (1024.0 * 1024 * 1024))
            .hnswIndexGB(hnswIndexBytes / (1024.0 * 1024 * 1024))
            .payloadGB(payloadBytes / (1024.0 * 1024 * 1024))
            .totalGB(totalBytes / (1024.0 * 1024 * 1024))
            .recommendedMemoryGB(recommendedMemoryBytes / (1024.0 * 1024 * 1024))
            .build();
    }
    
    /**
     * 打印容量规划报告
     */
    public static void printReport(long vectorCount, int dimension) {
        System.out.printf("\n=== 向量数据库容量规划 ===\n");
        System.out.printf("向量数量: %,d\n", vectorCount);
        System.out.printf("向量维度: %d\n\n", dimension);
        
        System.out.printf("%-12s %-10s %-12s %-12s %-15s\n", 
                          "M参数", "PQ压缩", "HNSW索引", "总存储", "建议内存");
        System.out.println("-".repeat(65));
        
        for (int m : new int[]{16, 24, 32}) {
            for (boolean pq : new boolean[]{false, true}) {
                CapacityReport r = estimate(vectorCount, dimension, m, pq);
                System.out.printf("%-12d %-10s %-12.2fGB %-12.2fGB %-15.2fGB\n",
                    m, pq ? "是" : "否", 
                    r.getHnswIndexGB(), r.getTotalGB(), r.getRecommendedMemoryGB());
            }
        }
    }
    
    public static void main(String[] args) {
        // 示例：150万条1536维向量
        printReport(1_500_000, 1536);
        
        // 输出示例：
        // 向量数量: 1,500,000
        // 向量维度: 1536
        // M参数       PQ压缩     HNSW索引     总存储       建议内存
        // 16          否         0.38GB      9.38GB      12.19GB
        // 16          是         0.38GB      2.61GB      3.40GB
        // 32          否         0.75GB      9.75GB      12.67GB
        // 32          是         0.75GB      2.98GB      3.87GB
    }
}

@Data
@Builder
class CapacityReport {
    private long vectorCount;
    private int dimension;
    private int m;
    private boolean withPQ;
    private double rawVectorGB;
    private double hnswIndexGB;
    private double payloadGB;
    private double totalGB;
    private double recommendedMemoryGB;
}

不同业务规模的推荐配置

业务规模	向量数量	推荐配置	所需内存	适用硬件
小型（知识库、FAQ）	< 10万	M=16, ef=64, 无PQ	< 1GB	2C4G
中型（企业知识库）	10万-100万	M=32, ef=64, 无PQ	4-12GB	4C16G
大型（行业知识库）	100万-1000万	M=32, ef=64, 开启PQ	4-8GB	8C32G
超大型（互联网规模）	> 1000万	分片+PQ	分布式	集群

总结

HNSW的调优本质是在召回率、查询速度、内存三者之间找到业务的最优平衡点。

核心原则：

先测量，后调优：用基准测试数据说话，不要凭感觉
ef是最简单的杠杆：查询时动态调整ef，无需重建索引
M=32是多数场景的最优解：精度和内存的最佳平衡
大规模场景三件套：磁盘存储 + PQ压缩 + 分片策略
监控先行：上线后持续监控索引率和查询延迟，发现问题及时处理

王建峰后来和我聊到这件事，说了一句话让我印象深刻：

"向量数据库不是配置完就不管了的东西。它像是一台发动机，了解它的工作原理，才能把它开到最快。"