第1831篇：HNSW索引算法深度解析——近似最近邻搜索的工程原理

老张大约 15 分钟

第1831篇：HNSW索引算法深度解析——近似最近邻搜索的工程原理

做向量检索这件事，我前后踩了差不多两年的坑。

最开始用暴力扫描（Flat Index），数据量小的时候没问题，一旦向量库超过百万级别，查询延迟直接飙到秒级，用户体验完全没法看。后来换了IVF，好了一些，但召回率又上不去，索引参数调来调去，总是在速度和精度之间做痛苦的取舍。

直到深入研究了HNSW，才算真正理解了：近似最近邻这件事，不是"不精确就将就"，而是有严格数学保障的工程方案。

今天从头把HNSW讲清楚，包括它的图结构设计、贪心搜索原理、Java工程实现，以及我在生产上遇到的真实问题。

一、为什么需要近似最近邻

先说个背景。向量检索的核心问题是：给定一个查询向量 q，在数据库里找到与 q 最相似的 K 个向量。

如果数据库里有 N 个向量，每个向量维度是 D，暴力扫描的时间复杂度是 O(N×D)。N=1000万、D=768的情况下，单次查询要做76亿次浮点运算，CPU上大概几十秒，这显然不可用。

近似最近邻（ANN）的思路是：允许损失一点召回率，换取查询速度的数量级提升。

具体来说，Recall@10=0.95 意味着：真实最近的10个结果里，我们能找回9.5个，这在工程上完全可以接受。

主流的ANN算法有三类：

类别	代表算法	核心思路
基于树	KD-Tree, Ball-Tree	空间分割
基于哈希	LSH	随机投影碰撞
基于图	HNSW, NSG	导航图搜索

在高维场景（D>100），基于树的方法因为"维度灾难"退化严重，LSH召回率不稳定，HNSW成了目前工业界公认最优的方案。

二、HNSW的核心思想：分层导航图

HNSW全称是 Hierarchical Navigable Small World，这个名字本身就说明了三个关键点：

Hierarchical（分层）：多层图结构，高层稀疏、低层稠密
Navigable（可导航）：贪心搜索能快速找到局部最优
Small World（小世界）：任意两点之间路径长度是对数级别的

2.1 NSW——先理解一维的小世界网络

HNSW的前身是NSW（Navigable Small World）。

想象你要在1000个城市里找离北京最近的城市。如果每个城市都只连接地理上相邻的城市，你从上海出发，每步只能移动一小段，搜索效率很低。

但如果图里同时存在"短程连接"（相邻城市）和"长程连接"（跨越远距离的直飞航线），你就可以先走长程连接快速接近目标区域，再用短程连接精确定位，这就是小世界网络的核心。

NSW用贪心搜索在这样的图上查询，效率是 O(log N)，但有个问题：在数据量很大时，初始化时插入的点会形成"核心枢纽"，这些点被过度连接，搜索路径不均匀。

2.2 HNSW——加入分层解决枢纽问题

HNSW的创新在于把NSW扩展成多层图：

Layer 2:  q -------- a
                      |
Layer 1:  q --- b --- a --- c
                |           |
Layer 0:  q-d-e-b-f-g-a-h-i-c-j  (完整图)

分层规则：

每个节点以概率 1/m（m是层间因子，通常取e的自然对数）决定它存在于哪一层
第0层包含所有节点
第1层包含约 1/m 的节点
第2层包含约 1/m² 的节点
以此类推

这样的指数衰减分布保证了：高层节点稀疏，充当"高速公路入口"；低层节点稠密，用于精确定位。

三、HNSW的两个核心操作

3.1 插入操作

Algorithm: INSERT(q, M, efConstruction, mL)
输入：新向量 q，每层最大连接数 M，构建时候选集大小 efConstruction，层间因子 mL

伪代码逻辑：

随机生成该节点的最高层 l = floor(-ln(uniform(0,1)) × mL)
从最高层 L 开始，贪心下降到 l+1 层，每层只保留1个最近点 ep（入口点）
从 l 层到第 0 层，每层执行：
- 用 SELECT-NEIGHBORS 找到候选邻居集合（大小 efConstruction）
- 从候选集里选出最优的 M 个连接
- 为新节点和这 M 个邻居互相建立双向连接
- 如果某邻居的连接数超限，用启发式算法删掉最远的连接

3.2 查询操作

Algorithm: SEARCH-LAYER(q, ep, ef, lc)
输入：查询向量 q，入口点 ep，候选集大小 ef，层号 lc

每层搜索使用贪心算法：

初始化候选集 C = {ep}，结果集 W =
循环：从 C 里取出距离 q 最近的点 c
如果 c 比 W 里最远的点还远，停止
否则：遍历 c 的所有邻居，距离 q 足够近的加入 C 和 W
保持 W 大小不超过 ef

整体查询从最高层开始，每层用 ef=1 贪心下降到第 1 层，最后在第 0 层用完整的 ef 搜索。

用图来说明整个流程：

四、关键参数详解

HNSW有几个参数，每个都对性能有实质影响，不是随便填的。

4.1 M（每层最大连接数）

M 控制每个节点在构建时最多连接多少个邻居。

M 越大：图连接越稠密，查询召回率高，但内存占用增加（每条边约8-16字节），构建时间也变长
M 越小：内存省，但召回率下降，搜索可能陷入局部最优

经验值：

通用文本检索：M=16
高召回率要求（>0.99）：M=32~64
内存极度受限：M=8（召回率会有明显损失）

4.2 efConstruction（构建时搜索宽度）

构建索引时，为新节点寻找邻居时的候选集大小。

efConstruction 越大：构建时考虑的候选越多，最终图质量越好，但构建越慢
必须 ≥ M，否则没意义
经验值：efConstruction = 2×M 到 4×M

4.3 efSearch（查询时搜索宽度）

查询时第0层的候选集大小，是运行时可以动态调整的参数。

efSearch 越大：召回率越高，查询越慢
efSearch 必须 ≥ K（返回结果数）
通常 efSearch = 50~200

这三个参数之间的关系：

五、Java工程实现

下面给一个完整可运行的HNSW实现，用于理解算法本质（生产上直接用Hnswlib或Milvus即可）。

5.1 核心数据结构

import java.util.*;
import java.util.concurrent.ThreadLocalRandom;

/**
 * HNSW 索引实现（教学版，单线程）
 */
public class HnswIndex {

    private final int M;                    // 每层最大连接数
    private final int maxM0;               // 第0层最大连接数（通常是2M）
    private final int efConstruction;      // 构建时候选集大小
    private final double mL;               // 层间因子 1/ln(M)

    private final List<float[]> vectors = new ArrayList<>();
    // layers.get(i) 是节点 i 的所有层的邻居列表
    // layers.get(i).get(l) 是节点 i 在第 l 层的邻居集合
    private final List<List<List<Integer>>> layers = new ArrayList<>();

    private int entryPoint = -1;           // 全局入口点
    private int maxLayer = 0;             // 当前最高层

    public HnswIndex(int M, int efConstruction) {
        this.M = M;
        this.maxM0 = M * 2;
        this.efConstruction = efConstruction;
        this.mL = 1.0 / Math.log(M);
    }

    /**
     * 计算两个向量的余弦距离（1 - 余弦相似度）
     * 注意：用距离而非相似度，方便做最小堆
     */
    private float distance(float[] a, float[] b) {
        float dot = 0, normA = 0, normB = 0;
        for (int i = 0; i < a.length; i++) {
            dot += a[i] * b[i];
            normA += a[i] * a[i];
            normB += b[i] * b[i];
        }
        if (normA == 0 || normB == 0) return 1.0f;
        return 1.0f - (dot / (float)(Math.sqrt(normA) * Math.sqrt(normB)));
    }

    /**
     * 随机生成节点最高层，指数衰减分布
     */
    private int randomLevel() {
        return (int)(-Math.log(ThreadLocalRandom.current().nextDouble()) * mL);
    }
}

5.2 邻居选择的启发式算法

HNSW论文里提出了两种邻居选择策略，第二种（启发式）效果更好：

/**
 * 启发式邻居选择：倾向于选择分散的邻居，而非仅仅最近的
 * 这样可以避免图在某方向过于稠密，提高全局导航能力
 *
 * @param candidates 候选节点集合（按距离排序的优先队列）
 * @param M          最多选择的邻居数
 * @param lc         当前层号（0层用maxM0）
 */
private List<Integer> selectNeighborsHeuristic(
        PriorityQueue<int[]> candidates, int M, int lc) {

    // 将候选集转成按距离升序排列
    List<int[]> candidateList = new ArrayList<>(candidates);
    candidateList.sort(Comparator.comparingInt(a -> a[1])); // a[0]=nodeId, a[1]=distBits

    List<Integer> result = new ArrayList<>();
    Set<Integer> discarded = new HashSet<>();

    for (int[] candidate : candidateList) {
        if (result.size() >= M) break;

        int candidateId = candidate[0];
        float distToQuery = Float.intBitsToFloat(candidate[1]);
        boolean good = true;

        // 启发式：如果候选点比它与已选邻居的距离还近，则跳过
        // 这确保了选出的邻居分散在不同方向
        for (int selectedId : result) {
            float distToSelected = distance(
                vectors.get(candidateId),
                vectors.get(selectedId)
            );
            if (distToSelected < distToQuery) {
                good = false;
                break;
            }
        }

        if (good) {
            result.add(candidateId);
        } else {
            discarded.add(candidateId);
        }
    }

    // 如果启发式选的不够M个，用discarded里的补充
    if (result.size() < M) {
        for (int id : discarded) {
            if (result.size() >= M) break;
            result.add(id);
        }
    }

    return result;
}

5.3 单层搜索

/**
 * 在指定层上从入口点出发搜索最近的 ef 个节点
 *
 * @param queryVec 查询向量
 * @param ep       入口点 ID
 * @param ef       候选集大小
 * @param lc       层号
 * @return 按距离升序排列的 ef 个候选节点
 */
private PriorityQueue<int[]> searchLayer(
        float[] queryVec, int ep, int ef, int lc) {

    // visited 防止重复处理
    Set<Integer> visited = new HashSet<>();
    visited.add(ep);

    float epDist = distance(queryVec, vectors.get(ep));

    // candidates：最小堆（距离最近的在堆顶）
    PriorityQueue<int[]> candidates = new PriorityQueue<>(
        Comparator.comparingInt(a -> a[1])
    );

    // result：最大堆（距离最远的在堆顶，方便控制大小）
    PriorityQueue<int[]> result = new PriorityQueue<>(
        (a, b) -> Integer.compare(b[1], a[1])
    );

    int epDistBits = Float.floatToIntBits(epDist);
    candidates.offer(new int[]{ep, epDistBits});
    result.offer(new int[]{ep, epDistBits});

    while (!candidates.isEmpty()) {
        int[] closest = candidates.poll();
        int closestId = closest[0];
        float closestDist = Float.intBitsToFloat(closest[1]);

        // 如果候选集中最近的点比结果集中最远的点还远，可以停止了
        float furthestResultDist = Float.intBitsToFloat(result.peek()[1]);
        if (closestDist > furthestResultDist) break;

        // 获取当前节点在 lc 层的邻居
        List<List<Integer>> nodeLayers = layers.get(closestId);
        if (lc < nodeLayers.size()) {
            for (int neighborId : nodeLayers.get(lc)) {
                if (!visited.contains(neighborId)) {
                    visited.add(neighborId);
                    float neighborDist = distance(queryVec, vectors.get(neighborId));
                    float furthestDist = Float.intBitsToFloat(result.peek()[1]);

                    if (result.size() < ef || neighborDist < furthestDist) {
                        int neighborDistBits = Float.floatToIntBits(neighborDist);
                        candidates.offer(new int[]{neighborId, neighborDistBits});
                        result.offer(new int[]{neighborId, neighborDistBits});
                        if (result.size() > ef) {
                            result.poll(); // 移除最远的
                        }
                    }
                }
            }
        }
    }

    return result;
}

5.4 插入和查询

/**
 * 向索引中插入一个向量
 */
public void insert(float[] vector) {
    int nodeId = vectors.size();
    vectors.add(vector);

    int nodeLevel = randomLevel();

    // 为新节点初始化层列表
    List<List<Integer>> nodeLayers = new ArrayList<>();
    for (int i = 0; i <= nodeLevel; i++) {
        nodeLayers.add(new ArrayList<>());
    }
    layers.add(nodeLayers);

    if (entryPoint == -1) {
        entryPoint = nodeId;
        maxLayer = nodeLevel;
        return;
    }

    int currentEp = entryPoint;

    // 从顶层贪心下降到 nodeLevel+1 层
    for (int lc = maxLayer; lc > nodeLevel; lc--) {
        PriorityQueue<int[]> result = searchLayer(vector, currentEp, 1, lc);
        currentEp = result.poll()[0];
    }

    // 从 nodeLevel 层到第 0 层，建立连接
    for (int lc = Math.min(nodeLevel, maxLayer); lc >= 0; lc--) {
        PriorityQueue<int[]> candidates = searchLayer(vector, currentEp, efConstruction, lc);

        int maxConnections = (lc == 0) ? maxM0 : M;
        List<Integer> neighbors = selectNeighborsHeuristic(candidates, maxConnections, lc);

        // 新节点连接到邻居
        nodeLayers.get(lc).addAll(neighbors);

        // 邻居也反向连接到新节点（双向图）
        for (int neighborId : neighbors) {
            List<Integer> neighborConnections = layers.get(neighborId).get(lc);
            neighborConnections.add(nodeId);

            // 如果邻居超过最大连接数，裁剪
            if (neighborConnections.size() > maxConnections) {
                // 重新用启发式选出最优的 maxConnections 个
                PriorityQueue<int[]> neighborCandidates = new PriorityQueue<>(
                    (a, b) -> Integer.compare(b[1], a[1])
                );
                float[] neighborVec = vectors.get(neighborId);
                for (int connId : neighborConnections) {
                    float d = distance(neighborVec, vectors.get(connId));
                    neighborCandidates.offer(new int[]{connId, Float.floatToIntBits(d)});
                }
                List<Integer> trimmed = selectNeighborsHeuristic(
                    neighborCandidates, maxConnections, lc);
                layers.get(neighborId).set(lc, new ArrayList<>(trimmed));
            }
        }

        if (!candidates.isEmpty()) {
            currentEp = candidates.peek()[0];
        }
    }

    // 更新全局入口点
    if (nodeLevel > maxLayer) {
        maxLayer = nodeLevel;
        entryPoint = nodeId;
    }
}

/**
 * 查询最近的 K 个向量
 *
 * @param queryVec 查询向量
 * @param k        返回数量
 * @param efSearch 搜索宽度（越大召回率越高）
 * @return 按相似度降序排列的节点 ID 列表
 */
public List<Integer> search(float[] queryVec, int k, int efSearch) {
    if (entryPoint == -1) return Collections.emptyList();

    int currentEp = entryPoint;

    // 从顶层贪心下降到第 1 层
    for (int lc = maxLayer; lc >= 1; lc--) {
        PriorityQueue<int[]> result = searchLayer(queryVec, currentEp, 1, lc);
        currentEp = result.poll()[0];
    }

    // 在第 0 层执行完整搜索
    int actualEf = Math.max(efSearch, k);
    PriorityQueue<int[]> result = searchLayer(queryVec, currentEp, actualEf, 0);

    // 取前 K 个（result 是最大堆，需要转换）
    List<int[]> resultList = new ArrayList<>(result);
    resultList.sort(Comparator.comparingInt(a -> a[1])); // 按距离升序

    List<Integer> topK = new ArrayList<>();
    for (int i = 0; i < Math.min(k, resultList.size()); i++) {
        topK.add(resultList.get(i)[0]);
    }
    return topK;
}

5.5 完整使用示例

public class HnswDemo {

    public static void main(String[] args) {
        // 构建索引：M=16, efConstruction=200
        HnswIndex index = new HnswIndex(16, 200);

        // 模拟插入1000个128维向量
        Random random = new Random(42);
        int numVectors = 1000;
        int dim = 128;
        float[][] data = new float[numVectors][dim];

        System.out.println("开始构建索引...");
        long buildStart = System.currentTimeMillis();

        for (int i = 0; i < numVectors; i++) {
            for (int d = 0; d < dim; d++) {
                data[i][d] = random.nextFloat() * 2 - 1;
            }
            index.insert(data[i]);
        }

        long buildEnd = System.currentTimeMillis();
        System.out.printf("索引构建完成，耗时 %d ms%n", buildEnd - buildStart);

        // 查询
        float[] query = new float[dim];
        for (int d = 0; d < dim; d++) {
            query[d] = random.nextFloat() * 2 - 1;
        }

        long queryStart = System.currentTimeMillis();
        List<Integer> results = index.search(query, 10, 50);
        long queryEnd = System.currentTimeMillis();

        System.out.printf("查询完成，耗时 %d ms，结果: %s%n",
            queryEnd - queryStart, results);

        // 验证召回率（与暴力搜索对比）
        verifyRecall(index, data, dim, random);
    }

    private static void verifyRecall(HnswIndex index, float[][] data,
                                      int dim, Random random) {
        int queryCount = 100;
        int k = 10;
        int efSearch = 50;
        int totalHits = 0;

        for (int q = 0; q < queryCount; q++) {
            float[] query = new float[dim];
            for (int d = 0; d < dim; d++) {
                query[d] = random.nextFloat() * 2 - 1;
            }

            // HNSW结果
            List<Integer> hnswResults = index.search(query, k, efSearch);
            Set<Integer> hnswSet = new HashSet<>(hnswResults);

            // 暴力搜索（ground truth）
            List<Integer> bruteForce = bruteForceSearch(query, data, k);

            // 计算交集
            for (int id : bruteForce) {
                if (hnswSet.contains(id)) totalHits++;
            }
        }

        double recall = (double) totalHits / (queryCount * k);
        System.out.printf("Recall@%d (efSearch=%d): %.4f%n", k, efSearch, recall);
    }

    private static List<Integer> bruteForceSearch(float[] query,
                                                    float[][] data, int k) {
        PriorityQueue<int[]> heap = new PriorityQueue<>(
            (a, b) -> Float.compare(Float.intBitsToFloat(b[1]),
                                     Float.intBitsToFloat(a[1]))
        );

        for (int i = 0; i < data.length; i++) {
            float dist = cosineDist(query, data[i]);
            heap.offer(new int[]{i, Float.floatToIntBits(dist)});
            if (heap.size() > k) heap.poll();
        }

        List<Integer> result = new ArrayList<>();
        while (!heap.isEmpty()) result.add(heap.poll()[0]);
        Collections.reverse(result);
        return result;
    }

    private static float cosineDist(float[] a, float[] b) {
        float dot = 0, normA = 0, normB = 0;
        for (int i = 0; i < a.length; i++) {
            dot += a[i] * b[i];
            normA += a[i] * a[i];
            normB += b[i] * b[i];
        }
        return 1.0f - dot / (float)(Math.sqrt(normA) * Math.sqrt(normB));
    }
}

六、生产踩坑记录

坑1：efSearch没有动态配置能力

早期我们把 efSearch 硬编码在服务初始化里，结果发现：白天流量高峰期，为了保性能调低 efSearch，但某些精度要求高的查询场景召回率就不达标了。

解决方案：按请求类型动态传入 efSearch。

public class VectorSearchService {

    // 不同业务场景的 efSearch 配置
    private static final Map<String, Integer> EF_SEARCH_CONFIG = Map.of(
        "recall_optimized", 200,    // 精排场景，要求高召回
        "latency_optimized", 50,    // 实时推荐，要求低延迟
        "balanced", 100             // 默认均衡
    );

    public List<SearchResult> search(float[] query, int k, String searchMode) {
        int efSearch = EF_SEARCH_CONFIG.getOrDefault(searchMode, 100);
        // efSearch 必须 >= k
        efSearch = Math.max(efSearch, k);
        return doSearch(query, k, efSearch);
    }
}

坑2：多线程并发插入的数据竞争

HNSW的插入操作涉及读写多个节点的邻居列表，如果不加锁，并发插入会导致图结构损坏，查询结果出现幽灵节点（ID指向不存在的向量）。

生产上用 hnswlib-java 的方案是细粒度锁：插入时对影响到的节点加锁，而不是全局锁。

// 使用 hnswlib-java（推荐生产使用）
import com.github.jelmerk.knn.hnsw.HnswIndex;
import com.github.jelmerk.knn.Item;

// hnswlib-java 内置了线程安全的并发插入
HnswIndex<String, float[], Item<String, float[]>, Float> hnswIndex =
    HnswIndex.newBuilder(dimensions, DistanceFunctions.FLOAT_COSINE_DISTANCE, maxElements)
        .withM(16)
        .withEfConstruction(200)
        .withEf(50)
        .build();

// 并发插入是安全的
ExecutorService executor = Executors.newFixedThreadPool(8);
List<Future<?>> futures = new ArrayList<>();
for (Item<String, float[]> item : items) {
    futures.add(executor.submit(() -> hnswIndex.add(item)));
}
for (Future<?> f : futures) f.get();

坑3：内存估算不准确导致OOM

HNSW的内存占用不是 N×D×4 字节（float32）这么简单，还包括图结构本身的边数据。

粗略估算公式：

总内存 ≈ N × (D × 4 + M × 2 × 4 × 层数均值 + 开销)
       ≈ N × (D × 4 + M × 8 × (1 + 1/M + 1/M²...) + 64)
       ≈ N × (D × 4 + M × 8 × M/(M-1) + 64)

对于 N=1000万、D=768、M=16：

单向量数据：1000万 × 768 × 4 ≈ 30.7 GB
图结构：1000万 × 16 × 8 × ~1.07 ≈ 1.37 GB
总计：约 32 GB

这就是为什么生产上一定要做向量压缩（int8量化、PQ乘积量化），否则内存成本太高。

七、HNSW与其他ANN算法的横向对比

用实际测试数据说话（SIFT-1M数据集，100万个128维向量）：

算法	Recall@10	QPS	内存	构建时间
Flat（暴力）	1.000	450	512MB	0s
IVF_FLAT	0.975	8500	520MB	15s
IVF_PQ	0.912	12000	65MB	45s
HNSW	0.993	15000	680MB	120s
HNSW+PQ	0.971	18000	85MB	180s

HNSW在召回率和查询速度上的组合是最优的，代价是内存略高和构建时间偏长。

八、总结

HNSW的精妙之处在于把"分层"和"小世界网络"这两个概念结合在一起：

分层解决了NSW的枢纽问题：高层节点随机稀疏，避免某些点成为过度集中的枢纽
小世界特性保证了对数时间复杂度：任意两点之间的导航路径长度是 O(log N)
启发式邻居选择保证了图的多样性：不只连最近的邻居，而是选方向分散的邻居

对于Java工程师来说，理解这些原理不是为了自己实现一个生产级HNSW（直接用Hnswlib或Milvus），而是：

知道参数背后的含义，调参时有依据
遇到召回率下降或性能劣化时，能定位根因
做架构选型时，能和业务讲清楚为什么选这个方案

下一篇我们讲向量相似度函数的选择，余弦相似度、内积、欧氏距离到底应该怎么选，选错了结果会差多少。