第2235篇：电商多模态AI——以图搜图、商品理解与自动上架的工程实现

老张2026/4/30大约 6 分钟

第2235篇：电商多模态AI——以图搜图、商品理解与自动上架的工程实现

适读人群：电商平台工程师、Java后端开发者、商品技术团队 | 阅读时长：约18分钟 | 核心价值：从真实电商场景出发，解决拍照搜货、商品属性提取、商家上架辅助三个核心问题

"拍照搜商品"这个功能，我前后做了三个版本，才算做对。

第一版：用颜色直方图相似度。结果用户拍了一个黑色双肩包，给他推荐了一堆黑色包，就是没有他要的那个款式。颜色一样，款式完全不对。

第二版：换了感知哈希算法，对形状更敏感。结果测试时发现，同一个商品主图稍微缩放一下，相似度就断崖式下跌。对光照变化也不鲁棒。

第三版：用Vision模型提取语义特征，再做向量化。终于有了接近人类感知的相似度——同款不同颜色相似，不同款即使颜色相同也不相似。

这三个版本的演进，其实是从"图片作为像素矩阵"到"图片作为语义信息载体"的认知转变。

以图搜图的核心技术路线

实际生产中，两种方法结合使用：

视觉哈希做快速初筛（从亿级商品过滤到万级候选）
Vision向量做精细排序（从万级候选里找最相关的Top-20）

/**
 * 电商图片搜索服务
 */
@Service
@Slf4j
public class ProductImageSearchService {

    private final ImageHasher perceptualHasher;
    private final VisionEmbeddingService visionEmbedding;
    private final VectorStore vectorStore;
    private final ProductRepository productRepo;

    /**
     * 以图搜商品
     * 
     * @param queryImage 用户上传的查询图片
     * @param searchConfig 搜索配置（类目限制、价格范围等）
     */
    public List<ProductSearchResult> searchByImage(
            byte[] queryImage,
            ProductSearchConfig searchConfig) {
        
        long startMs = System.currentTimeMillis();
        
        // Step 1: 图片预处理
        byte[] processedImage = preprocessImage(queryImage);
        
        // Step 2: 计算感知哈希（快速索引用）
        long pHash = perceptualHasher.computeHash(processedImage);
        
        // Step 3: Vision语义向量化
        float[] semanticVector = visionEmbedding.embed(processedImage);
        
        // Step 4: 向量相似度搜索（主要结果来源）
        List<ScoredDocument> vectorResults = vectorStore.search(
            "product_images",
            semanticVector,
            100,  // 取100个候选，后面精排
            buildFilter(searchConfig)
        );
        
        // Step 5: 加载商品信息并精排
        List<ProductSearchResult> results = vectorResults.stream()
            .map(doc -> buildSearchResult(doc, semanticVector))
            .sorted(Comparator.comparingDouble(ProductSearchResult::getScore).reversed())
            .limit(20)
            .collect(Collectors.toList());
        
        log.info("图片搜索完成: latencyMs={} results={}",
            System.currentTimeMillis() - startMs, results.size());
        
        return results;
    }

    /**
     * 商品图片向量化（索引构建时调用）
     */
    public void indexProduct(String productId, byte[] mainImage, Map<String, Object> metadata) {
        byte[] processed = preprocessImage(mainImage);
        float[] vector = visionEmbedding.embed(processed);
        long pHash = perceptualHasher.computeHash(processed);
        
        vectorStore.upsert("product_images", ProductImageDocument.builder()
            .id("product_" + productId)
            .productId(productId)
            .vector(vector)
            .perceptualHash(pHash)
            .metadata(metadata)
            .indexedAt(Instant.now())
            .build());
    }
}

商品属性智能提取

商家上架商品时，需要填写大量属性：类目、颜色、材质、风格、适用场景……

这些属性填写费时费力，而且质量参差不齐（有的商家就填"其他"，导致搜索召回很差）。

AI属性提取可以从商品图片自动推断这些属性：

/**
 * 商品属性AI提取服务
 */
@Service
public class ProductAttributeExtractor {

    private final VisionModelClient visionClient;
    private final CategoryAttributeSchema categorySchema;

    /**
     * 从商品图片提取属性
     * 
     * @param productImage 商品图片
     * @param categoryId 商品类目ID（不同类目有不同的属性集合）
     */
    public ProductAttributes extract(byte[] productImage, String categoryId) {
        // 获取该类目需要提取的属性定义
        AttributeSchema schema = categorySchema.get(categoryId);
        
        String prompt = buildExtractionPrompt(schema);
        
        VisionResponse response = visionClient.analyze(
            VisionRequest.builder()
                .image(productImage)
                .prompt(prompt)
                .maxTokens(500)
                .build()
        );
        
        return parseAttributes(response.getContent(), schema);
    }

    private String buildExtractionPrompt(AttributeSchema schema) {
        StringBuilder prompt = new StringBuilder();
        prompt.append("请根据商品图片，提取以下属性信息。尽可能准确，不确定的属性返回null。\n\n");
        prompt.append("需要提取的属性：\n");
        
        schema.getAttributes().forEach(attr -> {
            prompt.append(String.format("- %s（%s）：%s\n",
                attr.getName(),
                attr.getType(),  // 枚举/文本/数字
                attr.getOptions() != null ? 
                    "可选值: " + String.join("/", attr.getOptions()) : 
                    "自由描述"
            ));
        });
        
        prompt.append("\n请以JSON格式返回，key为属性名，value为提取的值。");
        
        return prompt.toString();
    }

    /**
     * 常见类目的属性提取示例
     */
    @PostConstruct
    private void initCategorySchemas() {
        // 女装类目
        categorySchema.put("women_clothing", AttributeSchema.builder()
            .attributes(Arrays.asList(
                Attribute.enumType("颜色", 
                    "黑色", "白色", "红色", "蓝色", "绿色", "粉色", "灰色", "其他"),
                Attribute.enumType("风格", 
                    "休闲", "正式", "运动", "甜美", "简约", "复古"),
                Attribute.enumType("版型", 
                    "修身", "宽松", "直筒", "A字"),
                Attribute.textType("材质"),
                Attribute.enumType("领型",
                    "圆领", "V领", "高领", "一字领", "翻领"),
                Attribute.enumType("袖长",
                    "无袖", "短袖", "长袖", "七分袖")
            ))
            .build());
    }
}

自动商品描述生成

有了属性，再结合图片，可以自动生成商品详情文案：

/**
 * 商品文案生成服务
 */
@Service
public class ProductDescriptionGenerator {

    private final LLMClient llmClient;
    private final VisionModelClient visionClient;

    /**
     * 生成商品标题和详情描述
     */
    public ProductContent generate(
            byte[] mainImage,
            ProductAttributes attributes,
            String categoryName) {
        
        // 用Vision模型理解图片内容
        String imageDescription = visionClient.describe(mainImage,
            "请简洁描述这个商品的外观特征，包括颜色、款式、材质感等，用于辅助生成商品文案。100字以内。");
        
        // 结合属性和图片描述生成文案
        String prompt = String.format("""
            请为以下商品生成吸引买家的文案：
            
            商品类目：%s
            商品属性：%s
            图片描述：%s
            
            请生成：
            1. 商品标题（30字以内，包含核心关键词，突出卖点）
            2. 商品卖点（3-5个，每个10字以内）
            3. 详情描述（200字，场景化描述，突出使用价值）
            
            注意：语言要真实、接地气，不要过度夸大，避免违禁词。
            按JSON格式返回。
            """,
            categoryName,
            attributes.toSummaryText(),
            imageDescription
        );
        
        LLMResponse response = llmClient.call(LLMRequest.of(prompt));
        
        return parseProductContent(response.getContent());
    }
}

大促场景的性能保障

电商平台大促期间，图片搜索的QPS可以瞬间上涨10倍。需要提前做性能保障：

/**
 * 图片向量搜索缓存层
 * 对热门查询向量做缓存，减少向量数据库压力
 */
@Service
public class VectorSearchCache {

    private final Cache<String, List<ProductSearchResult>> cache;
    
    public VectorSearchCache() {
        // 缓存最热的10000个查询，TTL=10分钟
        this.cache = Caffeine.newBuilder()
            .maximumSize(10000)
            .expireAfterWrite(Duration.ofMinutes(10))
            .recordStats()
            .build();
    }

    /**
     * 缓存key：向量的近似哈希
     * 相似的查询向量产生相同的缓存key
     */
    public String computeCacheKey(float[] vector, ProductSearchConfig config) {
        // 把向量量化到低精度，相似向量产生相同key
        int[] quantized = new int[16];  // 只用16维
        for (int i = 0; i < 16; i++) {
            quantized[i] = Math.round(vector[i * (vector.length / 16)] * 10);
        }
        return Arrays.toString(quantized) + "_" + config.getCategoryId();
    }
    
    public Optional<List<ProductSearchResult>> get(String key) {
        return Optional.ofNullable(cache.getIfPresent(key));
    }
    
    public void put(String key, List<ProductSearchResult> results) {
        cache.put(key, results);
    }
}

实践总结

做这类系统积累的经验：

买家图片和卖家图片的质量差距很大：用户用手机拍的图片光线不好、角度偏、有遮挡。系统要对低质量查询图片有一定的容忍度，不能只在高质量主图上测试效果。
索引更新的时效性：新上架商品要尽快进索引，大促期间可能有大量商品上线。批量索引要有限流和优先级队列，不能阻塞实时搜索。
效果评估要用真实用户行为：相似度得分高不等于用户满意。要看用户点击率、转化率。有时候"完全相同"的商品并不是用户想要的，他可能想要"相似但更便宜的替代品"。