第1931篇：函数调用的JSON Schema设计规范——让LLM更准确地理解工具能力

老张2026/4/30大约 12 分钟

第1931篇：函数调用的JSON Schema设计规范——让LLM更准确地理解工具能力

我刚开始做Function Calling相关业务的时候，踩了不少坑。有一次给模型定义了一个查询订单的工具，Schema写得很随意，结果模型时而传字符串类型的orderId，时而传整数，还有一次它把本该是枚举值的status字段填成了自然语言"已取消"，而不是我期待的CANCELLED。整个工具调用链三天两头出错，我排查了好几天才意识到问题根源不在代码，而在Schema设计。

JSON Schema这东西看起来简单，写几个字段名和类型就完了。但在LLM这个场景里，它不只是数据契约，它更是模型理解你工具能力的唯一说明书。Schema写得好，模型调用成功率能从70%飙到98%。Schema写得烂，你会花大量时间在参数校验和异常处理上，然后还是解决不了根本问题。

这篇文章我把这两年在Function Calling上积累的Schema设计经验全梳理出来，从基础规范到生产级踩坑，争取让你少走几个月的弯路。

为什么Schema设计如此重要

先说一个反直觉的结论：LLM在调用工具时，它并不"理解"你的代码逻辑，它只读你的Schema描述。

这意味着什么？意味着Schema里的description字段不是可选的注释，而是模型推断何时调用、如何传参的核心依据。如果你的description写得模糊，模型就会自己发挥——而这种发挥往往不是你想要的。

我做过一个简单实验，同样的工具，用两个版本的Schema做测试：

版本A（随意写的）：

{
  "name": "get_order",
  "description": "获取订单",
  "parameters": {
    "type": "object",
    "properties": {
      "id": {
        "type": "string"
      }
    }
  }
}

版本B（精心设计的）：

{
  "name": "get_order_by_id",
  "description": "根据订单ID查询订单详细信息，包括订单状态、商品列表、收货地址和支付信息。注意：此工具只接受系统内部订单号（格式为ORD-XXXXXX），不接受用户可见的短单号。",
  "parameters": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "description": "系统内部订单号，格式为ORD-后跟6位数字，例如：ORD-123456",
        "pattern": "^ORD-\\d{6}$"
      }
    },
    "required": ["order_id"]
  }
}

在100次调用测试中，版本A有31次参数格式错误，版本B只有2次（那2次是模型确实没有拿到正确的订单号，不是Schema的问题）。差距显而易见。

基础规范：五个必须做到的事情

1. 工具名称要用动词+名词的动作短语

工具名称是模型决策"用哪个工具"的第一信号。名称本身要能自解释。

// 不好的命名
order
user_data
process

// 好的命名  
get_order_by_id
create_customer_profile
calculate_shipping_cost
search_products_by_keyword
cancel_order_with_reason

动词选择要精准：get表示查单个，search/list表示查多个，create表示新建，update表示修改，delete/remove表示删除，calculate/compute表示计算，validate表示校验。不要什么都用process或者handle，这类词对模型没有任何额外信息量。

2. description必须包含四个要素

一个合格的工具description应该包含：

这个工具做什么（功能描述）
什么时候该用它（使用场景）
什么时候不该用它（排除场景，这个很多人忘写）
特殊约束（格式要求、前置条件等）

// Java中定义工具的示例
public class ToolDefinitionBuilder {
    
    public static Map<String, Object> buildGetOrderTool() {
        Map<String, Object> tool = new HashMap<>();
        tool.put("type", "function");
        
        Map<String, Object> function = new HashMap<>();
        function.put("name", "get_order_by_id");
        function.put("description", 
            "根据订单ID查询单个订单的完整信息。" +
            "适用场景：用户询问某个具体订单的状态、详情时使用。" +
            "不适用场景：用户想查询多个订单、或者按时间段查询时，请使用list_orders工具。" +
            "前置条件：需要用户已登录，且订单属于该用户。"
        );
        
        // 参数定义
        Map<String, Object> parameters = buildOrderParameters();
        function.put("parameters", parameters);
        
        tool.put("function", function);
        return tool;
    }
    
    private static Map<String, Object> buildOrderParameters() {
        Map<String, Object> params = new HashMap<>();
        params.put("type", "object");
        
        Map<String, Object> properties = new HashMap<>();
        Map<String, Object> orderId = new HashMap<>();
        orderId.put("type", "string");
        orderId.put("description", "系统订单ID，格式为ORD-后跟6位数字");
        orderId.put("pattern", "^ORD-\\d{6}$");
        properties.put("order_id", orderId);
        
        params.put("properties", properties);
        params.put("required", List.of("order_id"));
        return params;
    }
}

3. 枚举值必须用enum约束，不能靠description说明

这是我见过最多人犯的错误。有人在description里写"status可以是待支付、已支付、已发货、已完成、已取消"，然后type写string，不加enum。结果模型有时传中文、有时传英文、有时传拼音。

// 错误做法
{
  "status": {
    "type": "string",
    "description": "订单状态，可以是：PENDING_PAYMENT待支付、PAID已支付、SHIPPED已发货、COMPLETED已完成、CANCELLED已取消"
  }
}

// 正确做法
{
  "status": {
    "type": "string",
    "description": "订单状态",
    "enum": ["PENDING_PAYMENT", "PAID", "SHIPPED", "COMPLETED", "CANCELLED"],
    "enumDescriptions": {
      "PENDING_PAYMENT": "待支付",
      "PAID": "已支付",
      "SHIPPED": "已发货", 
      "COMPLETED": "已完成",
      "CANCELLED": "已取消"
    }
  }
}

注意：enumDescriptions不是标准JSON Schema字段，但大多数LLM支持在properties里放这个扩展字段，用来给模型提供枚举值的中文映射。你也可以把映射关系写在description里：

{
  "status": {
    "type": "string",
    "description": "订单状态枚举值。PENDING_PAYMENT=待支付，PAID=已支付，SHIPPED=已发货，COMPLETED=已完成，CANCELLED=已取消",
    "enum": ["PENDING_PAYMENT", "PAID", "SHIPPED", "COMPLETED", "CANCELLED"]
  }
}

4. 数字类型要明确integer还是number，要加range约束

{
  "page_size": {
    "type": "integer",
    "description": "每页返回的记录数",
    "minimum": 1,
    "maximum": 100,
    "default": 20
  },
  "discount_rate": {
    "type": "number",
    "description": "折扣率，0.0到1.0之间的小数，例如0.8表示八折",
    "minimum": 0.0,
    "maximum": 1.0
  }
}

不加range的后果：我曾经有个分页工具，pageSize没加maximum，某次对话中模型传了pageSize=10000，把数据库查询拖垮了。

5. required字段要认真填，不该required的别加

required的设计是很多人随手填的，其实它直接影响模型的调用决策。

如果一个参数加了required，模型在没有足够信息时会向用户追问。如果没加required，模型可能直接用默认值或者省略调用。你要根据业务逻辑判断哪些真的是必填的。

// 示例：搜索工具的参数设计
Map<String, Object> searchParams = new LinkedHashMap<>();
searchParams.put("type", "object");

Map<String, Object> properties = new LinkedHashMap<>();

// keyword是真正必填的，没有它无法搜索
Map<String, Object> keyword = new HashMap<>();
keyword.put("type", "string");
keyword.put("description", "搜索关键词，支持商品名称、品牌、型号的模糊搜索");
keyword.put("minLength", 1);
keyword.put("maxLength", 100);
properties.put("keyword", keyword);

// category是可选过滤条件，不加required
Map<String, Object> category = new HashMap<>();
category.put("type", "string");
category.put("description", "商品分类过滤，不传则搜索所有分类");
category.put("enum", Arrays.asList("electronics", "clothing", "food", "sports", "home"));
properties.put("category", category);

// page默认第一页，也不加required
Map<String, Object> page = new HashMap<>();
page.put("type", "integer");
page.put("description", "页码，从1开始，默认为1");
page.put("minimum", 1);
page.put("default", 1);
properties.put("page", page);

searchParams.put("properties", properties);
searchParams.put("required", List.of("keyword")); // 只有keyword是required

进阶技巧：让Schema更精准的几个手段

嵌套对象的设计

当工具参数包含复杂的嵌套结构时，要用$defs或者内联的object类型，不要把所有字段全铺在一层。

{
  "name": "create_shipment",
  "description": "创建发货单",
  "parameters": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "description": "关联的订单ID"
      },
      "recipient": {
        "type": "object",
        "description": "收件人信息",
        "properties": {
          "name": {
            "type": "string",
            "description": "收件人姓名，2-20个字符"
          },
          "phone": {
            "type": "string",
            "description": "手机号，11位数字",
            "pattern": "^1[3-9]\\d{9}$"
          },
          "address": {
            "type": "object",
            "description": "详细地址",
            "properties": {
              "province": {"type": "string", "description": "省份"},
              "city": {"type": "string", "description": "城市"},
              "district": {"type": "string", "description": "区/县"},
              "street": {"type": "string", "description": "街道门牌号"}
            },
            "required": ["province", "city", "district", "street"]
          }
        },
        "required": ["name", "phone", "address"]
      },
      "carrier": {
        "type": "string",
        "description": "快递公司",
        "enum": ["SF", "YTO", "STO", "ZTO", "YUNDA", "JD"]
      }
    },
    "required": ["order_id", "recipient", "carrier"]
  }
}

数组类型的正确写法

{
  "product_ids": {
    "type": "array",
    "description": "要查询的商品ID列表，最少1个，最多50个",
    "items": {
      "type": "string",
      "description": "商品ID，格式为PROD-XXXXXX",
      "pattern": "^PROD-\\d{6}$"
    },
    "minItems": 1,
    "maxItems": 50,
    "uniqueItems": true
  }
}

uniqueItems: true这个约束很重要，能避免模型传入重复的ID。

oneOf/anyOf处理多态参数

有些工具参数的结构取决于某个字段的值，比如支付工具，选支付宝和选银行卡需要的参数不同。这时候用oneOf：

{
  "payment_info": {
    "description": "支付信息，根据支付方式不同，所需字段也不同",
    "oneOf": [
      {
        "type": "object",
        "title": "支付宝支付",
        "properties": {
          "method": {"type": "string", "enum": ["ALIPAY"]},
          "alipay_account": {
            "type": "string",
            "description": "支付宝账号（手机号或邮箱）"
          }
        },
        "required": ["method", "alipay_account"]
      },
      {
        "type": "object",
        "title": "银行卡支付",
        "properties": {
          "method": {"type": "string", "enum": ["BANK_CARD"]},
          "card_number": {
            "type": "string",
            "description": "银行卡号，16-19位数字"
          },
          "bank_name": {
            "type": "string",
            "description": "开户行名称"
          }
        },
        "required": ["method", "card_number", "bank_name"]
      }
    ]
  }
}

工程实践：用Java构建类型安全的Schema定义体系

在真实项目里，Schema不应该以散落的Map形式存在，应该有一套类型安全的定义体系。下面是我在项目中用的一套做法：

// 工具参数定义的类型安全构建器
public class SchemaBuilder {
    
    private final Map<String, Object> schema = new LinkedHashMap<>();
    private final Map<String, Object> properties = new LinkedHashMap<>();
    private final List<String> required = new ArrayList<>();
    
    public static SchemaBuilder object() {
        SchemaBuilder builder = new SchemaBuilder();
        builder.schema.put("type", "object");
        return builder;
    }
    
    public SchemaBuilder stringField(String name, String description) {
        Map<String, Object> field = new LinkedHashMap<>();
        field.put("type", "string");
        field.put("description", description);
        properties.put(name, field);
        return this;
    }
    
    public SchemaBuilder stringField(String name, String description, Consumer<StringFieldBuilder> config) {
        StringFieldBuilder fieldBuilder = new StringFieldBuilder(description);
        config.accept(fieldBuilder);
        properties.put(name, fieldBuilder.build());
        return this;
    }
    
    public SchemaBuilder integerField(String name, String description, Consumer<IntegerFieldBuilder> config) {
        IntegerFieldBuilder fieldBuilder = new IntegerFieldBuilder(description);
        config.accept(fieldBuilder);
        properties.put(name, fieldBuilder.build());
        return this;
    }
    
    public SchemaBuilder require(String... fields) {
        required.addAll(Arrays.asList(fields));
        return this;
    }
    
    public Map<String, Object> build() {
        schema.put("properties", properties);
        if (!required.isEmpty()) {
            schema.put("required", required);
        }
        return schema;
    }
    
    // 字符串字段构建器
    public static class StringFieldBuilder {
        private final Map<String, Object> field = new LinkedHashMap<>();
        
        public StringFieldBuilder(String description) {
            field.put("type", "string");
            field.put("description", description);
        }
        
        public StringFieldBuilder pattern(String regex) {
            field.put("pattern", regex);
            return this;
        }
        
        public StringFieldBuilder enumValues(String... values) {
            field.put("enum", Arrays.asList(values));
            return this;
        }
        
        public StringFieldBuilder minLength(int min) {
            field.put("minLength", min);
            return this;
        }
        
        public StringFieldBuilder maxLength(int max) {
            field.put("maxLength", max);
            return this;
        }
        
        public Map<String, Object> build() {
            return field;
        }
    }
    
    // 整数字段构建器
    public static class IntegerFieldBuilder {
        private final Map<String, Object> field = new LinkedHashMap<>();
        
        public IntegerFieldBuilder(String description) {
            field.put("type", "integer");
            field.put("description", description);
        }
        
        public IntegerFieldBuilder minimum(int min) {
            field.put("minimum", min);
            return this;
        }
        
        public IntegerFieldBuilder maximum(int max) {
            field.put("maximum", max);
            return this;
        }
        
        public IntegerFieldBuilder defaultValue(int value) {
            field.put("default", value);
            return this;
        }
        
        public Map<String, Object> build() {
            return field;
        }
    }
}

用起来就很干净了：

// 使用构建器定义工具参数
Map<String, Object> params = SchemaBuilder.object()
    .stringField("order_id", "订单ID，格式为ORD-XXXXXX", 
        f -> f.pattern("^ORD-\\d{6}$"))
    .stringField("status", "要更新的订单状态", 
        f -> f.enumValues("PENDING_PAYMENT", "PAID", "SHIPPED", "COMPLETED", "CANCELLED"))
    .stringField("remark", "备注信息，可选", 
        f -> f.maxLength(200))
    .integerField("operator_id", "操作员工ID", 
        f -> f.minimum(1))
    .require("order_id", "status", "operator_id")
    .build();

多工具场景的Schema冲突处理

当你注册了很多工具时，一个常见问题是：工具之间的职责边界模糊，模型不知道该用哪个。

比如你同时有：

search_products：按关键词搜索商品
get_product_by_id：按ID查商品
list_products_by_category：按分类列出商品

如果这三个工具的description都只写了"查询商品"，模型会随机选一个。解决办法是在description里明确区分场景：

[
  {
    "name": "search_products",
    "description": "当用户用自然语言描述想要的商品时使用，例如'找一款性价比高的蓝牙耳机'、'有没有适合送给妈妈的礼物'。支持模糊匹配和语义搜索。"
  },
  {
    "name": "get_product_by_id", 
    "description": "当已知精确的商品ID时使用。仅适用于有明确商品ID（格式PROD-XXXXXX）的查询，不适合关键词搜索。"
  },
  {
    "name": "list_products_by_category",
    "description": "当用户想浏览某个分类下所有商品时使用，例如'看看你们的手机有哪些'、'显示所有运动类商品'。返回分类下的商品列表，不支持关键词过滤。"
  }
]

这样模型在决策时的准确率会大幅提升。

一个完整的生产级示例

最后把上面所有知识点整合进一个完整示例，电商客服场景的工具集定义：

@Component
public class CustomerServiceTools {
    
    public List<Map<String, Object>> getAllTools() {
        return List.of(
            buildGetOrderTool(),
            buildUpdateOrderStatusTool(),
            buildRefundTool(),
            buildSearchProductsTool()
        );
    }
    
    private Map<String, Object> buildGetOrderTool() {
        return ToolBuilder.create("get_order_by_id")
            .description("""
                根据订单ID查询订单详细信息，包括：订单状态、商品列表、金额、收货地址、物流信息。
                使用场景：用户询问"我的订单到哪了"、"订单状态是什么"等具体订单问题。
                不适用：如果用户没有提供具体订单号，先问用户要订单号再调用此工具。
                """)
            .parameters(SchemaBuilder.object()
                .stringField("order_id", "系统订单ID，格式为ORD-后跟6位数字，例如ORD-123456", 
                    f -> f.pattern("^ORD-\\d{6}$"))
                .require("order_id")
                .build())
            .build();
    }
    
    private Map<String, Object> buildRefundTool() {
        return ToolBuilder.create("apply_refund")
            .description("""
                为指定订单申请退款。
                使用场景：用户明确表达要退款意愿，且已确认订单ID和退款原因后调用。
                重要约束：
                1. 申请前必须先用get_order_by_id确认订单状态，只有PAID/SHIPPED状态可退款
                2. COMPLETED状态（已超过7天）的订单无法退款，需转人工客服
                3. 退款原因必须是用户明确说明的，不能代替用户填写
                """)
            .parameters(SchemaBuilder.object()
                .stringField("order_id", "要退款的订单ID，格式ORD-XXXXXX",
                    f -> f.pattern("^ORD-\\d{6}$"))
                .stringField("reason_code", "退款原因分类", 
                    f -> f.enumValues("QUALITY_ISSUE", "WRONG_ITEM", "CHANGE_MIND", "DELAYED_DELIVERY", "OTHER"))
                .stringField("reason_detail", "退款原因详细说明，来自用户的原话",
                    f -> f.minLength(5).maxLength(500))
                .require("order_id", "reason_code", "reason_detail")
                .build())
            .build();
    }
    
    // ToolBuilder省略实现，和SchemaBuilder类似的流式API
}

验证你的Schema设计是否合格

我自己在审查Schema设计时会过一遍这个检查清单：

□ 工具名称是否是动词+名词的动作短语？
□ description是否说明了功能、适用场景、不适用场景？
□ 所有枚举类型是否用enum约束而非description描述？
□ 数字类型是否有minimum/maximum限制？
□ pattern约束是否用正则精确匹配了格式要求？
□ required列表是否只包含真正必填的字段？
□ 嵌套对象的每一层required是否都填了？
□ 多工具之间是否有明确的职责边界描述？
□ 参数名是否清晰，避免id/name这类歧义词？
□ 长字符串是否有maxLength防止超长输入？

每次我按这个清单过一遍，至少能发现2-3个问题。不夸张。

Schema设计是Function Calling成功率的地基。代码写得再好，地基不稳，上面的楼迟早会歪。先把Schema设计好，后面工具调用、Agent编排的问题才会少很多。