第1913篇：Elasticsearch向量搜索——kNN搜索与全文检索的混合查询

老张大约 9 分钟

第1913篇：Elasticsearch向量搜索——kNN搜索与全文检索的混合查询

有一个问题困扰过我很长时间：用户搜索"性价比高的安卓旗舰手机"，我到底应该用关键字匹配还是向量搜索？

关键字匹配能精确捕获"安卓""旗舰"这些词，但"性价比高"这种语义词汇它完全不懂；向量搜索能理解语义，但如果用户搜的是一个具体型号比如"小米14 Ultra"，向量搜索可能把一堆相关手机都召回来，就是不保证那款机子一定在最前面。

这个矛盾在 Elasticsearch 8.x 里有了很优雅的解法：kNN 向量搜索 + BM25 全文检索的混合查询。今天这篇文章就拆开聊这件事。

一、Elasticsearch 的向量能力发展历程

不了解这段历史，容易选错 API。

ES 7.3：引入 dense_vector 字段类型，但只支持脚本打分，性能极差
ES 8.0：引入原生 kNN 搜索，支持 HNSW 索引，性能质的飞跃
ES 8.4：kNN 进入 GA，支持在 query 上下文中使用，可以和 BM25 组合
ES 8.8：引入 int8_hnsw 和 int4_hnsw，支持向量量化，内存占用大幅减少
ES 8.13：混合搜索（Hybrid Search）体验进一步完善，RRF 排名融合内置支持

如果你现在还在用 ES 7.x 的脚本打分方案，该升级了。

二、环境搭建与索引配置

2.1 Docker 部署

# docker-compose.yml
version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - xpack.security.http.ssl.enabled=false
      - ES_JAVA_OPTS=-Xms4g -Xmx4g
    ports:
      - "9200:9200"
    volumes:
      - es_data:/usr/share/elasticsearch/data
  kibana:
    image: docker.elastic.co/kibana/kibana:8.13.0
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
volumes:
  es_data:

2.2 Java 客户端依赖

<dependency>
    <groupId>co.elastic.clients</groupId>
    <artifactId>elasticsearch-java</artifactId>
    <version>8.13.0</version>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
</dependency>

2.3 索引 Mapping 设计

这是整篇文章最重要的部分之一，Mapping 设计决定了你能做什么查询。

@Component
@RequiredArgsConstructor
public class EsIndexManager {

    private final ElasticsearchClient esClient;
    private static final String INDEX_NAME = "products";

    public void createProductIndex() throws IOException {
        // 先检查索引是否存在
        boolean exists = esClient.indices()
            .exists(e -> e.index(INDEX_NAME))
            .value();
        if (exists) {
            log.info("索引 {} 已存在", INDEX_NAME);
            return;
        }

        esClient.indices().create(c -> c
            .index(INDEX_NAME)
            .settings(s -> s
                // 分片数，生产环境根据数据量调整
                .numberOfShards("1")
                .numberOfReplicas("1")
                .refreshInterval(ri -> ri.time("5s"))  // 降低刷新频率提升写入性能
            )
            .mappings(m -> m
                .properties("product_id", p -> p
                    .keyword(k -> k))
                .properties("title", p -> p
                    // 同时支持全文检索（中文分词）和关键字精确匹配
                    .text(t -> t
                        .analyzer("ik_max_word")
                        .searchAnalyzer("ik_smart")
                        .fields("keyword", f -> f.keyword(k -> k))
                    )
                )
                .properties("description", p -> p
                    .text(t -> t.analyzer("ik_max_word")))
                .properties("category", p -> p
                    .keyword(k -> k))
                .properties("brand", p -> p
                    .keyword(k -> k))
                .properties("price", p -> p
                    .double_(d -> d))
                .properties("sales_count", p -> p
                    .long_(l -> l))
                // 向量字段：HNSW 索引，余弦相似度
                .properties("title_vector", p -> p
                    .denseVector(dv -> dv
                        .dims(1536)
                        .index(true)
                        .similarity("cosine")
                        .indexOptions(io -> io
                            .type("hnsw")
                            .m(16)
                            .efConstruction(100)
                        )
                    )
                )
            )
        );
        log.info("索引 {} 创建成功", INDEX_NAME);
    }
}

注意 title 字段的设计：既用于全文检索（text 类型，IK 分词），又加了 .keyword 子字段用于精确匹配。title_vector 存储 title 的语义向量，维度要和你使用的 embedding 模型对齐。

三、数据写入：向量和文本同步索引

@Service
@RequiredArgsConstructor
@Slf4j
public class ProductIndexService {

    private final ElasticsearchClient esClient;
    private final EmbeddingClient embeddingClient;

    public void indexProduct(ProductDTO product) throws IOException {
        // 计算 title 的向量
        float[] titleVector = embeddingClient.embed(product.getTitle());

        // 构建文档
        Map<String, Object> doc = new HashMap<>();
        doc.put("product_id", product.getProductId());
        doc.put("title", product.getTitle());
        doc.put("description", product.getDescription());
        doc.put("category", product.getCategory());
        doc.put("brand", product.getBrand());
        doc.put("price", product.getPrice());
        doc.put("sales_count", product.getSalesCount());
        doc.put("title_vector", titleVector);

        esClient.index(i -> i
            .index(INDEX_NAME)
            .id(product.getProductId())
            .document(doc)
        );
    }

    /**
     * 批量写入（使用 Bulk API）
     */
    public void bulkIndexProducts(List<ProductDTO> products) throws IOException {
        List<String> titles = products.stream()
            .map(ProductDTO::getTitle)
            .collect(Collectors.toList());

        // 批量获取 embedding（减少 API 调用次数）
        List<float[]> vectors = embeddingClient.embedBatch(titles);

        BulkRequest.Builder bulkBuilder = new BulkRequest.Builder();
        for (int i = 0; i < products.size(); i++) {
            ProductDTO product = products.get(i);
            float[] vector = vectors.get(i);

            Map<String, Object> doc = buildDocument(product, vector);
            final int idx = i;
            bulkBuilder.operations(op -> op
                .index(id -> id
                    .index(INDEX_NAME)
                    .id(product.getProductId())
                    .document(doc)
                )
            );
        }

        BulkResponse response = esClient.bulk(bulkBuilder.build());
        if (response.errors()) {
            long errorCount = response.items().stream()
                .filter(item -> item.error() != null)
                .count();
            log.error("Bulk 写入有 {} 条失败", errorCount);
        } else {
            log.info("Bulk 写入 {} 条文档成功", products.size());
        }
    }
}

四、kNN 向量搜索

4.1 纯向量搜索

public List<ProductSearchResult> knnSearch(
        String queryText, int topK) throws IOException {

    float[] queryVector = embeddingClient.embed(queryText);

    SearchResponse<Map> response = esClient.search(s -> s
        .index(INDEX_NAME)
        .knn(k -> k
            .field("title_vector")
            .queryVector(toFloatList(queryVector))
            .k(topK)
            .numCandidates(topK * 5)  // 候选数量，越大召回越准但越慢
        )
        .size(topK),
        Map.class
    );

    return parseResults(response);
}

numCandidates 这个参数很关键。它控制 HNSW 搜索时探索的候选节点数量，类似于 pgvector 里的 ef_search。值越大，召回率越高，延迟也越高。经验值是 k 的 5~10 倍。

4.2 带过滤条件的 kNN（Pre-filtering）

public List<ProductSearchResult> filteredKnnSearch(
        String queryText, String category,
        double minPrice, double maxPrice, int topK) throws IOException {

    float[] queryVector = embeddingClient.embed(queryText);

    SearchResponse<Map> response = esClient.search(s -> s
        .index(INDEX_NAME)
        .knn(k -> k
            .field("title_vector")
            .queryVector(toFloatList(queryVector))
            .k(topK)
            .numCandidates(topK * 5)
            // kNN 的 filter 是 Pre-filtering，先过滤再向量搜索
            .filter(f -> f
                .bool(b -> b
                    .must(m -> m.term(t -> t
                        .field("category").value(category)))
                    .must(m -> m.range(r -> r
                        .field("price")
                        .gte(JsonData.of(minPrice))
                        .lte(JsonData.of(maxPrice))
                    ))
                )
            )
        )
        .size(topK),
        Map.class
    );

    return parseResults(response);
}

ES 8.x 的 kNN 支持 filter 子句，这是 Pre-filtering 的实现方式。Pre-filtering 在过滤条件命中率高（比如 50% 以上的文档满足条件）时效果很好；如果过滤条件非常严格（只有 1% 的文档满足），Pre-filtering 会导致候选集太小，召回率下降，这时候需要增大 numCandidates。

五、混合搜索：kNN + BM25 的最佳实践

这是重头戏。ES 8.4+ 支持在同一个查询里同时使用 kNN 和 BM25，然后用 RRF（Reciprocal Rank Fusion）算法融合排序结果。

5.1 RRF 混合搜索

public List<ProductSearchResult> hybridSearch(
        String queryText, int topK) throws IOException {

    float[] queryVector = embeddingClient.embed(queryText);

    SearchResponse<Map> response = esClient.search(s -> s
        .index(INDEX_NAME)
        // kNN 检索
        .knn(k -> k
            .field("title_vector")
            .queryVector(toFloatList(queryVector))
            .k(topK * 2)       // 比 topK 多取一些，给 RRF 融合留空间
            .numCandidates(topK * 10)
        )
        // BM25 全文检索
        .query(q -> q
            .bool(b -> b
                .should(sh -> sh
                    .match(m -> m
                        .field("title")
                        .query(queryText)
                        .boost(2.0f)   // title 权重更高
                    )
                )
                .should(sh -> sh
                    .match(m -> m
                        .field("description")
                        .query(queryText)
                        .boost(1.0f)
                    )
                )
            )
        )
        // RRF 融合排序
        .rank(r -> r
            .rrf(rrf -> rrf
                .rankConstant(60L)   // RRF 的 k 参数，越大排名越平滑
                .windowSize((long) topK * 2)
            )
        )
        .size(topK),
        Map.class
    );

    return parseResults(response);
}

RRF 算法的核心公式是：score = Σ 1/(k + rank_i)，其中 k 是 rankConstant（默认 60），rank_i 是文档在第 i 个检索列表中的排名。

这个算法的优点是不需要对向量分数和 BM25 分数做归一化处理，直接基于排名融合，不受分数量纲的影响。实践证明它在大多数场景下比线性加权效果更稳定。

5.2 更复杂的混合查询场景

有时候我们还需要在混合搜索的基础上加业务逻辑，比如提升高销量商品、过滤下架商品：

public List<ProductSearchResult> advancedHybridSearch(
        String queryText, SearchFilters filters, int topK) throws IOException {

    float[] queryVector = embeddingClient.embed(queryText);

    SearchResponse<Map> response = esClient.search(s -> s
        .index(INDEX_NAME)
        .knn(k -> k
            .field("title_vector")
            .queryVector(toFloatList(queryVector))
            .k(topK * 2)
            .numCandidates(topK * 10)
            // 过滤已下架商品
            .filter(f -> f
                .term(t -> t.field("status").value("online")))
        )
        .query(q -> q
            .functionScore(fs -> fs
                .query(innerQ -> innerQ
                    .bool(b -> b
                        // 全文检索
                        .must(m -> m.multiMatch(mm -> mm
                            .fields("title^2", "description")
                            .query(queryText)
                            .type(TextQueryType.BestFields)
                        ))
                        // 过滤条件
                        .filter(f -> f.term(t -> t
                            .field("status").value("online")))
                        .filter(f -> filters.getCategory() != null
                            ? f.term(t -> t.field("category")
                                .value(filters.getCategory()))
                            : f.matchAll(ma -> ma))
                    )
                )
                // 销量提权函数
                .functions(fn -> fn
                    .fieldValueFactor(fvf -> fvf
                        .field("sales_count")
                        .factor(0.1)
                        .modifier(FieldValueFactorModifier.Log1p)
                        .missing(0.0)
                    )
                )
                .boostMode(FunctionBoostMode.Sum)
            )
        )
        .rank(r -> r.rrf(rrf -> rrf.rankConstant(60L)))
        .size(topK),
        Map.class
    );

    return parseResults(response);
}

六、查询性能调优

6.1 用 Profile API 诊断慢查询

public void profileQuery(String queryText) throws IOException {
    float[] queryVector = embeddingClient.embed(queryText);

    SearchResponse<Map> response = esClient.search(s -> s
        .index(INDEX_NAME)
        .profile(true)  // 开启 Profile
        .knn(k -> k
            .field("title_vector")
            .queryVector(toFloatList(queryVector))
            .k(20)
            .numCandidates(100)
        )
        .size(20),
        Map.class
    );

    // Profile 结果包含每个查询阶段的耗时
    if (response.profile() != null) {
        response.profile().shards().forEach(shard -> {
            log.info("Shard {} Profile: {}", shard.id(), shard.searches());
        });
    }
}

6.2 索引参数的影响

// 通过 _settings API 动态调整
PUT /products/_settings
{
  "index": {
    "knn.algo_param.ef_search": 100
  }
}

实测数据（100 万文档，1536 维向量）：

ef_search	P50延迟	P99延迟	Recall@10
16	8ms	25ms	88%
32	10ms	30ms	94%
64	15ms	45ms	97%
100	22ms	65ms	98.5%

6.3 向量量化：减少内存占用

ES 8.8+ 支持 int8_hnsw，用 INT8 量化存储向量，内存减少约 75%，精度损失通常在 1% 以内：

PUT /products
{
  "mappings": {
    "properties": {
      "title_vector": {
        "type": "dense_vector",
        "dims": 1536,
        "index": true,
        "similarity": "cosine",
        "index_options": {
          "type": "int8_hnsw",
          "m": 16,
          "ef_construction": 100
        }
      }
    }
  }
}

在生产环境内存有限的情况下，这是一个值得考虑的优化。

七、Spring Data Elasticsearch 的向量搜索集成

如果你在用 Spring Boot，Spring Data Elasticsearch 5.x 已经支持向量搜索，但 API 还不够完善，复杂查询建议直接用原生 ES Java 客户端。

// 简单场景可以用 Spring Data Elasticsearch
@Document(indexName = "products")
public class ProductDocument {

    @Id
    private String productId;

    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String title;

    @Field(type = FieldType.Keyword)
    private String category;

    // 向量字段暂时用 @Field 注解标记，具体 mapping 通过 JSON 配置
    @Field(type = FieldType.Dense_Vector, dims = 1536)
    private float[] titleVector;
}

八、完整的搜索架构

九、踩坑合集

坑1：dense_vector 字段不能聚合

ES 的 dense_vector 字段不支持聚合操作（aggs），你没法基于向量字段做分组统计。这个限制很多人踩过。如果需要聚合，只能在业务字段上做。

坑2：kNN 和 query 的分数不能直接比较

kNN 返回的分数是相似度（0~1），BM25 的分数是 TF-IDF 加权的无量纲值（通常 1~20+），直接做线性加权没有意义。用 RRF 是目前最稳的方式。如果一定要线性加权，需要先分别归一化。

坑3：numCandidates 设置不当导致召回率低

我见过有人把 numCandidates 设置得和 k 一样，结果向量搜索召回率极差。numCandidates 至少要是 k 的 3 倍，推荐 5~10 倍。

坑4：IK 分词器没装导致 Mapping 创建失败

如果你指定了 analyzer: "ik_max_word" 但没有安装 IK 分词器插件，建索引会失败并报 analyzer not found 错误。确保先安装插件：

./bin/elasticsearch-plugin install https://github.com/infinilabs/analysis-ik/releases/download/v8.13.0/elasticsearch-analysis-ik-8.13.0.zip

坑5：向量字段更新不会触发重新索引

ES 的向量索引是在文档写入时构建的，如果你更新了 dense_vector 字段，ES 会重新构建该文档的向量索引，但这个过程比普通字段更新慢。批量更新向量字段时，注意监控写入延迟。

十、实际业务效果

在我们的商品搜索场景（1500 万商品，日均 5000 万次搜索）中，从纯 BM25 迁移到 BM25+kNN 混合搜索之后：

搜索点击率提升 18%（A/B 实验，置信度 99%）
长尾查询（4 个词以上）的相关性提升最明显，点击率提升 32%
精确型查询（品牌+型号组合）的效果和纯 BM25 持平，没有退步
平均查询延迟从 12ms 上涨到 28ms（可接受范围内）

混合搜索在语义理解和关键字精确匹配之间找到了很好的平衡点，这是我目前在工程实践中最推荐的方案。