第1838篇：向量搜索的过滤优化——Pre-filter vs Post-filter的性能权衡

老张大约 14 分钟

第1838篇：向量搜索的过滤优化——Pre-filter vs Post-filter的性能权衡

向量搜索加过滤条件——这个需求几乎在所有生产级 RAG 项目里都会遇到。

最常见的场景：

"只在用户有权限的文档里搜索"
"只搜这个时间范围内的文档"
"只搜某个部门的知识库"

看起来很简单，但实现方式不同，性能差距可以有几十倍。我见过一个项目，在向量搜索结果上直接加业务过滤，QPS 从 2000 降到了 80。这篇彻底讲清楚 Pre-filter 和 Post-filter 的本质区别，以及如何根据业务特点选策略。

一、过滤方式的三种基本模型

在向量搜索里，"带条件的 ANN 搜索"有三种主要策略：

每种策略适合不同的场景，没有绝对的优劣。

二、Post-filter：最简单但有缺陷

2.1 工作方式

1. 执行 ANN 搜索，召回 Top-K 个结果
2. 对这 K 个结果按业务条件过滤
3. 返回过滤后的结果

2.2 致命问题：过滤率高时召回数量不足

假设要求返回 K=10 个结果，业务过滤条件的选择率（selectivity）是 10%——即全库中只有 10% 的文档满足条件。

ANN 搜索召回的 Top-10 里，期望满足条件的文档数量是 10 × 10% = 1 个。

也就是说，你要求返回 10 个相关文档，但 Post-filter 只能给你 1 个。这就是"召回数量不足"问题。

解决方式：过采样（oversampling）——把 ANN 召回的 K 放大 1/selectivity 倍。

/**
 * Post-filter 的过采样策略
 */
public class PostFilterSearchService {

    private final VectorSearchEngine vectorEngine;

    /**
     * 带过滤的向量搜索（Post-filter + 过采样）
     *
     * @param queryVec    查询向量
     * @param filter      业务过滤条件
     * @param k           期望返回数量
     * @param selectivity 过滤条件的预估选择率（0~1），
     *                    选择率越低需要越多过采样
     */
    public List<SearchResult> search(float[] queryVec,
                                      Predicate<Document> filter,
                                      int k, float selectivity) {
        if (selectivity <= 0 || selectivity > 1) {
            throw new IllegalArgumentException("选择率必须在 (0, 1] 之间");
        }

        // 过采样倍率，最小保证召回 3k 个，防止过采样不足
        int oversamplingK = Math.max((int)(k / selectivity), k * 3);
        // 但也不能太大，否则 ANN 搜索本身变慢
        oversamplingK = Math.min(oversamplingK, 1000);

        List<Document> candidates = vectorEngine.search(queryVec, oversamplingK);

        // 应用业务过滤
        List<Document> filtered = candidates.stream()
            .filter(filter)
            .limit(k)
            .collect(Collectors.toList());

        System.out.printf("Post-filter: ANN召回 %d，过滤后 %d/%d%n",
            oversamplingK, filtered.size(), k);

        return filtered.stream()
            .map(doc -> new SearchResult(doc.getId(), doc.getContent()))
            .collect(Collectors.toList());
    }

    public record SearchResult(String docId, String content) {}
}

2.3 Post-filter 的适用条件

Post-filter 在以下情况下有效：

过滤条件的选择率 高于 20%（大部分文档都满足条件）
对召回率要求不高，少一两个结果可以接受
过滤条件在查询时才能确定（如用户实时权限查询）

2.4 Post-filter 的性能问题根源

回到最开始那个例子：加了过滤后 QPS 从 2000 降到 80，原因不只是过采样本身，而是：

他们的实现是：ANN 搜索召回 10000 个候选（因为过滤率极高），然后对 10000 个结果做业务过滤查数据库。

这 10000 次数据库查询的 I/O 才是性能杀手，不是向量搜索本身。

正确做法是把过滤所需的属性字段直接存在向量库里（作为标量字段），避免回查数据库。

/**
 * 把过滤字段存入 Milvus 的 payload，避免回查数据库
 */
public class MilvusAwarePostFilter {

    private final MilvusClientV2 milvusClient;

    /**
     * 利用 Milvus 的 output_fields 在 ANN 结果上直接过滤
     * 所有过滤字段必须提前存入 Milvus
     */
    public List<SearchResult> search(String collectionName,
                                      float[] queryVec,
                                      String filterExpr,
                                      int k) {
        // 方案1: 让 Milvus 做 Post-filter（在 ANN 结果上应用表达式）
        // Milvus 内部自动处理过采样
        var response = milvusClient.search(SearchReq.newBuilder()
            .collectionName(collectionName)
            .data(List.of(floatToList(queryVec)))
            .annsField("embedding")
            .topK(k)
            // Milvus 的 filter 表达式
            // 注意: 这是 Post-filter，Milvus 内部会处理过采样
            .filter(filterExpr)
            .outputFields(List.of("doc_id", "content", "department", "created_at"))
            .build());

        return response.getSearchResults().get(0).stream()
            .map(r -> new SearchResult(
                (String) r.getEntity().get("doc_id"),
                (String) r.getEntity().get("content"),
                r.getScore()
            ))
            .collect(Collectors.toList());
    }
}

三、Pre-filter：高过滤率的正确答案

3.1 工作方式

1. 先用标量条件查询，得到满足条件的文档 ID 集合
2. 只在这个 ID 集合内做 ANN 搜索

Pre-filter 解决了召回数量不足的问题，因为从一开始就在满足条件的子集内搜索。

3.2 Pre-filter 的实现方式

方式一：倒排索引 + 向量搜索（Milvus 的默认方案）

/**
 * Milvus 的 Pre-filter 模式
 * 通过在 filter 参数里使用 Milvus 标量索引，触发 Pre-filter
 */
public class MilvusPreFilterSearch {

    private final MilvusClientV2 milvusClient;

    /**
     * 创建带标量索引的 Collection，以支持 Pre-filter
     */
    public void createCollectionWithScalarIndex(String collectionName) {
        var schema = CreateCollectionReq.CollectionSchema.newBuilder()
            .addField(AddFieldReq.newBuilder()
                .fieldName("id").dataType(DataType.Int64)
                .isPrimaryKey(true).autoID(true).build())
            .addField(AddFieldReq.newBuilder()
                .fieldName("doc_id").dataType(DataType.VarChar)
                .maxLength(64).build())
            .addField(AddFieldReq.newBuilder()
                .fieldName("content").dataType(DataType.VarChar)
                .maxLength(4000).build())
            // 用于过滤的标量字段
            .addField(AddFieldReq.newBuilder()
                .fieldName("department_id").dataType(DataType.Int32).build())
            .addField(AddFieldReq.newBuilder()
                .fieldName("created_timestamp").dataType(DataType.Int64).build())
            .addField(AddFieldReq.newBuilder()
                .fieldName("is_public").dataType(DataType.Bool).build())
            .addField(AddFieldReq.newBuilder()
                .fieldName("embedding").dataType(DataType.FloatVector)
                .dimension(768).build())
            .build();

        // 向量索引
        var vectorIndex = IndexParam.builder()
            .fieldName("embedding")
            .indexType(IndexParam.IndexType.HNSW)
            .metricType(IndexParam.MetricType.COSINE)
            .extraParams(Map.of("M", 16, "efConstruction", 200))
            .build();

        // 标量索引（支持高效 Pre-filter）
        var deptIndex = IndexParam.builder()
            .fieldName("department_id")
            .indexType(IndexParam.IndexType.INVERTED)  // 倒排索引
            .build();

        var timeIndex = IndexParam.builder()
            .fieldName("created_timestamp")
            .indexType(IndexParam.IndexType.STL_SORT)  // 有序索引（适合范围查询）
            .build();

        milvusClient.createCollection(CreateCollectionReq.newBuilder()
            .collectionName(collectionName)
            .collectionSchema(schema)
            .indexParams(List.of(vectorIndex, deptIndex, timeIndex))
            .build());
    }

    /**
     * 带过滤的向量搜索
     * Milvus 会根据过滤条件的选择率自动选择 Pre-filter 或 Post-filter
     * 可以通过 search 参数强制指定
     */
    public List<SearchResult> searchWithFilter(String collectionName,
                                                float[] queryVec,
                                                int departmentId,
                                                long startTime, long endTime,
                                                int topK) {
        // Milvus 过滤表达式（支持 AND/OR/NOT，类 SQL 语法）
        String filterExpr = String.format(
            "department_id == %d AND created_timestamp >= %d AND created_timestamp <= %d",
            departmentId, startTime, endTime
        );

        var response = milvusClient.search(SearchReq.newBuilder()
            .collectionName(collectionName)
            .data(List.of(floatToList(queryVec)))
            .annsField("embedding")
            .topK(topK)
            .filter(filterExpr)
            .outputFields(List.of("doc_id", "content"))
            // 通过 search params 可以影响 Milvus 的过滤策略
            // radius: 相似度阈值，只返回相似度 >= radius 的结果
            // range_filter: 和 radius 配合使用
            .build());

        return parseResults(response.getSearchResults().get(0));
    }

    /**
     * 复杂权限过滤（多个允许的部门 + 公开文档）
     */
    public List<SearchResult> searchWithPermission(String collectionName,
                                                    float[] queryVec,
                                                    List<Integer> allowedDeptIds,
                                                    boolean includePublic,
                                                    int topK) {
        StringBuilder filterBuilder = new StringBuilder();

        // 部门权限
        String deptFilter = "department_id in [" +
            allowedDeptIds.stream().map(String::valueOf)
                .collect(Collectors.joining(",")) + "]";

        if (includePublic) {
            filterBuilder.append("(").append(deptFilter)
                         .append(" OR is_public == true)");
        } else {
            filterBuilder.append(deptFilter);
        }

        var response = milvusClient.search(SearchReq.newBuilder()
            .collectionName(collectionName)
            .data(List.of(floatToList(queryVec)))
            .annsField("embedding")
            .topK(topK)
            .filter(filterBuilder.toString())
            .outputFields(List.of("doc_id", "content", "is_public"))
            .build());

        return parseResults(response.getSearchResults().get(0));
    }

    private List<SearchResult> parseResults(
            List<SearchResp.SearchResult> rawResults) {
        return rawResults.stream()
            .map(r -> new SearchResult(
                (String) r.getEntity().get("doc_id"),
                (String) r.getEntity().get("content"),
                r.getScore()
            ))
            .collect(Collectors.toList());
    }

    private List<Float> floatToList(float[] arr) {
        List<Float> list = new ArrayList<>(arr.length);
        for (float v : arr) list.add(v);
        return list;
    }

    public record SearchResult(String docId, String content, float score) {}
}

3.3 Pre-filter 的内部实现原理

Milvus 等向量库实现 Pre-filter 的方式：

HNSW 在 Pre-filter 模式下的工作方式：在图遍历时，对每个候选节点检查是否在允许集合（bitset）里，不在则跳过。

这有个性能问题：如果允许集合很小（选择率很低），HNSW 的图遍历可能访问大量节点才能找到 K 个满足条件的，性能急剧下降。

规律：

选择率 > 30%：Pre-filter 效果好
选择率 1%~30%：需要特殊优化
选择率 < 1%：应考虑换成"先用标量查询，再做精确排序"

四、Pre-filter vs Post-filter 的性能对比

用真实测试数据（Milvus，1000 万向量，768 维，8 核 CPU）：

场景	Pre-filter QPS	Post-filter QPS
选择率 80%	1800	2100
选择率 50%	1500	1800
选择率 20%	1200	1400
选择率 10%	800	950
选择率 5%	400	1200（但召回不足）
选择率 1%	150	2000（但召回严重不足）

观察：

选择率 > 10%：两者差距不大，Post-filter 略快
选择率 5% 以下：Post-filter QPS 更高，但召回质量严重下降
纯粹追求 QPS 但不在意召回质量：Post-filter
需要保证 K 个高质量结果：Pre-filter

五、混合策略：根据选择率动态选择

/**
 * 自适应过滤策略
 * 根据过滤条件的预估选择率，动态选择 Pre-filter 或 Post-filter
 */
public class AdaptiveFilterSearchService {

    private final MilvusClientV2 milvusClient;
    private final SelectivityEstimator selectivityEstimator;

    // 选择率阈值
    private static final float PRE_FILTER_THRESHOLD = 0.10f;  // <10% 用Pre-filter
    private static final float POST_FILTER_SAFE_THRESHOLD = 0.30f; // >30% 用Post-filter

    public List<SearchResult> search(String collectionName,
                                      float[] queryVec,
                                      FilterCondition filter,
                                      int topK) {

        float selectivity = selectivityEstimator.estimate(
            collectionName, filter);

        System.out.printf("过滤条件预估选择率: %.2f%%%n", selectivity * 100);

        if (selectivity < PRE_FILTER_THRESHOLD) {
            // 选择率极低：Pre-filter 在 HNSW 上会降速严重
            // 考虑换策略：先标量查询，再向量排序
            return scalarFirstSearch(collectionName, queryVec, filter, topK);
        } else if (selectivity < POST_FILTER_SAFE_THRESHOLD) {
            // 中等选择率：Pre-filter
            return preFilterSearch(collectionName, queryVec, filter, topK);
        } else {
            // 高选择率：Post-filter 效率更高
            return postFilterSearch(collectionName, queryVec, filter, topK,
                selectivity);
        }
    }

    /**
     * 极低选择率场景：先用标量查询找候选，再用向量排序
     */
    private List<SearchResult> scalarFirstSearch(String collectionName,
                                                  float[] queryVec,
                                                  FilterCondition filter,
                                                  int topK) {
        // Step 1: 标量查询获取所有满足条件的文档 ID
        List<String> filteredDocIds = milvusClient.query(QueryReq.newBuilder()
            .collectionName(collectionName)
            .filter(filter.toMilvusExpr())
            .outputFields(List.of("doc_id", "embedding"))
            .limit(10000)  // 设置上限防止太多
            .build())
            .getQueryResults().stream()
            .map(r -> (String) r.getEntity().get("doc_id"))
            .collect(Collectors.toList());

        if (filteredDocIds.size() <= topK) {
            // 候选数量少于 topK，不需要向量搜索，直接返回所有
            return filteredDocIds.stream()
                .limit(topK)
                .map(id -> new SearchResult(id, "", 0.0f))
                .collect(Collectors.toList());
        }

        // Step 2: 对候选集做向量相似度排序（暴力计算）
        // 候选集小时，暴力计算比 HNSW 更快
        System.out.printf("Scalar-first: 标量过滤得到 %d 个候选，暴力向量排序%n",
            filteredDocIds.size());

        return bruteForceRank(queryVec, filteredDocIds, topK);
    }

    /**
     * 选择率估算器
     * 通过采样或统计信息预估过滤条件的选择率
     */
    @Component
    public static class SelectivityEstimator {

        private final MilvusClientV2 milvusClient;
        // 缓存过滤条件的选择率统计
        private final Cache<String, Float> selectivityCache;

        public SelectivityEstimator(MilvusClientV2 client) {
            this.milvusClient = client;
            this.selectivityCache = Caffeine.newBuilder()
                .maximumSize(500)
                .expireAfterWrite(10, TimeUnit.MINUTES)
                .build();
        }

        public float estimate(String collection, FilterCondition filter) {
            String cacheKey = collection + ":" + filter.cacheKey();
            Float cached = selectivityCache.getIfPresent(cacheKey);
            if (cached != null) return cached;

            // 通过 Milvus count(*) 查询估算
            // 注意：这本身有延迟，需要权衡是否值得
            try {
                var countResult = milvusClient.query(QueryReq.newBuilder()
                    .collectionName(collection)
                    .filter(filter.toMilvusExpr())
                    .outputFields(List.of("count(*)"))
                    .build());

                long filteredCount = (long) countResult
                    .getQueryResults().get(0).getEntity().get("count(*)");

                var totalResult = milvusClient.query(QueryReq.newBuilder()
                    .collectionName(collection)
                    .outputFields(List.of("count(*)"))
                    .build());

                long totalCount = (long) totalResult
                    .getQueryResults().get(0).getEntity().get("count(*)");

                float selectivity = totalCount == 0 ? 0 :
                    (float) filteredCount / totalCount;
                selectivityCache.put(cacheKey, selectivity);
                return selectivity;
            } catch (Exception e) {
                // 估算失败，返回中等值，使用 Pre-filter
                return 0.15f;
            }
        }
    }

    public record SearchResult(String docId, String content, float score) {}
}

六、特殊场景：基于用户权限的向量搜索

企业场景里，权限过滤是最高频的需求，且权限规则通常很复杂。

6.1 权限模型设计

/**
 * 基于 RBAC 的文档权限向量搜索
 */
public class RBACVectorSearchService {

    private final MilvusClientV2 milvusClient;
    private final PermissionService permissionService;

    /**
     * 获取用户有权访问的文档过滤表达式
     * 权限规则: 用户可以看到以下任一条件的文档：
     *   1. 公开文档
     *   2. 属于用户所在部门的文档
     *   3. 用户被显式授权的文档
     */
    public String buildPermissionFilter(String userId) {
        UserPermissions perms = permissionService.getPermissions(userId);

        List<String> conditions = new ArrayList<>();

        // 公开文档
        conditions.add("is_public == true");

        // 部门文档
        if (!perms.departmentIds().isEmpty()) {
            String deptFilter = "dept_id in [" +
                perms.departmentIds().stream()
                    .map(String::valueOf)
                    .collect(Collectors.joining(",")) + "]";
            conditions.add(deptFilter);
        }

        // 显式授权文档（通过文档 ID 集合）
        // 注意: 如果授权文档数量很大（>1000），不适合放在过滤表达式里
        if (!perms.explicitDocIds().isEmpty() &&
            perms.explicitDocIds().size() <= 100) {
            String docFilter = "doc_id in [\"" +
                String.join("\",\"", perms.explicitDocIds()) + "\"]";
            conditions.add(docFilter);
        }

        return "(" + String.join(" OR ", conditions) + ")";
    }

    /**
     * 大量显式授权文档的处理（授权文档数 > 100 时）
     * 用 Pre-filter Bitmap 而不是 IN 表达式
     */
    public List<SearchResult> searchWithLargePermissionSet(
            String collectionName,
            float[] queryVec,
            String userId,
            int topK) {

        UserPermissions perms = permissionService.getPermissions(userId);

        if (perms.explicitDocIds().size() <= 100) {
            // 正常情况，用过滤表达式
            String filter = buildPermissionFilter(userId);
            return executeSearch(collectionName, queryVec, filter, topK);
        }

        // 大量显式授权：两步策略
        // Step 1: 先用部门+公开条件做 ANN 搜索（不包含显式授权）
        String baseFilter = buildBaseFilter(perms);
        List<SearchResult> baseResults = executeSearch(
            collectionName, queryVec, baseFilter, topK * 2);

        // Step 2: 合并显式授权文档（从缓存/ES查询）
        // 这里用向量相似度排序显式授权文档
        List<SearchResult> explicitResults = searchExplicitDocs(
            queryVec, perms.explicitDocIds(), topK);

        // Step 3: 合并两部分结果并重排
        return mergeAndRank(baseResults, explicitResults, topK);
    }

    public record SearchResult(String docId, String content, float score) {}
}

6.2 权限缓存对性能的影响

/**
 * 权限过滤表达式缓存
 * 避免每次查询都重新计算权限
 */
@Component
public class PermissionFilterCache {

    private final Cache<String, String> filterCache = Caffeine.newBuilder()
        .maximumSize(10000)
        .expireAfterWrite(5, TimeUnit.MINUTES)  // 权限通常不会频繁变化
        .build();

    public String getOrCompute(String userId,
                                Function<String, String> filterBuilder) {
        return filterCache.get(userId, filterBuilder);
    }

    /**
     * 权限变更时主动失效
     */
    public void invalidateUser(String userId) {
        filterCache.invalidate(userId);
    }
}

七、Qdrant 的 Pre-filter 实现差异

不同向量库的 Pre-filter 实现细节不同。Qdrant 用"payload 过滤"：

/**
 * Qdrant 的过滤查询示例
 * Qdrant 默认会根据过滤条件选择率自动选择 Pre/Post-filter
 * 通过 hnsw_ef 参数可以控制搜索深度
 */
public class QdrantFilteredSearch {

    private final QdrantClient qdrantClient;

    public List<SearchResult> search(String collectionName,
                                      float[] queryVec,
                                      int departmentId,
                                      long startTime,
                                      int topK) {
        var filter = Filter.newBuilder()
            .addMust(Condition.newBuilder()
                .setField(FieldCondition.newBuilder()
                    .setKey("department_id")
                    .setMatch(Match.newBuilder()
                        .setInteger(departmentId)
                        .build())
                    .build())
                .build())
            .addMust(Condition.newBuilder()
                .setField(FieldCondition.newBuilder()
                    .setKey("created_at")
                    .setRange(Range.newBuilder()
                        .setGte(startTime)
                        .build())
                    .build())
                .build())
            .build();

        var searchParams = SearchParams.newBuilder()
            .setHnswEf(128)  // 更大的 ef 提高在 Pre-filter 下的召回率
            .build();

        var response = qdrantClient.searchAsync(
            SearchPoints.newBuilder()
                .setCollectionName(collectionName)
                .setFilter(filter)
                .addAllVector(floatToList(queryVec))
                .setLimit(topK)
                .setParams(searchParams)
                .setWithPayload(WithPayloadSelector.newBuilder()
                    .setEnable(true).build())
                .build()
        ).get();

        return response.getResultList().stream()
            .map(r -> new SearchResult(
                r.getPayload().get("doc_id").getStringValue(),
                r.getPayload().get("content").getStringValue(),
                r.getScore()
            ))
            .collect(Collectors.toList());
    }

    private List<Float> floatToList(float[] arr) {
        List<Float> list = new ArrayList<>(arr.length);
        for (float v : arr) list.add(v);
        return list;
    }

    public record SearchResult(String docId, String content, float score) {}
}