AI Skill 模板与起手式:SKILL.md、Tool Spec、Evals、Tracing 清单

这份附录用于什么

当你写一个新的 Skill/工作流时,最容易卡在“从哪里开始、要交付什么”。这份附录把系列里重复出现的模板集中到一页,你可以直接复制到仓库里用。

注意:本博客当前的 Markdown 渲染未启用 GFM,我会避免使用表格语法,统一用标题与列表。

模板 1:SKILL.md(可直接复制)

# <Skill Name>

## What
一句话说明它做什么(面向使用者)。

## When to use
- 场景 1
- 场景 2
- 场景 3

## Non-goals (Important)
- 明确不做什么 1
- 明确不做什么 2

## Inputs (Contract)
- <field>: <type> (required|optional), default=<...>, constraints=<...>

## Outputs (Contract)
- 固定结构/固定小节顺序(强烈建议结构化)

## Tools & Permissions
- <tool_name>: read|write, required scopes/whitelist

## Failure types & Fallbacks
- INVALID_INPUT -> 直接拒绝并提示修复方式
- TOOL_AUTH_REQUIRED -> 引导授权
- TOOL_TIMEOUT -> 重试/降级/走缓存
- TOOL_RATE_LIMIT -> 退避重试/降级
- LOW_CONFIDENCE -> 触发人类确认点

## Human-in-the-loop gates
- 任何写操作默认需要确认(展示将要写入的 payload)
- 预算/成本超过阈值需要确认

## Examples
### Example 1
Input:
{...}
Output:
{...}

## Versioning
当前版本:x.y.z
兼容性说明:哪些输入输出变更会破坏兼容

模板 2:Tool Spec(schema + 权限 + 幂等)

{
  "name": "github.create_comment",
  "description": "Create a PR comment. (write; requires human approval)",
  "input_schema": {
    "type": "object",
    "properties": {
      "repo": { "type": "string" },
      "pull_number": { "type": "number" },
      "body_markdown": { "type": "string", "maxLength": 8000 },
      "idempotency_key": { "type": "string", "maxLength": 128 }
    },
    "required": ["repo", "pull_number", "body_markdown", "idempotency_key"],
    "additionalProperties": false
  },
  "permissions": {
    "scope": "write",
    "resource_whitelist": {
      "repos": ["org/repo-a", "org/repo-b"]
    },
    "requires_human_approval": true
  }
}

模板 3:Evals case(JSONL)与 Rubric(可检查)

cases.jsonl(示例)

{"id":"case_001","input":{"repo":"org/repo","pull_number":1},"expected":{"must_have_fields":["summary","risks"],"max_words":300}}
{"id":"case_002","input":{"repo":"org/repo","pull_number":999999},"expected":{"error_type":"INVALID_INPUT"}}

rubric.md(示例)

# Rubric

Hard checks (fail if any):
- Output is valid JSON and matches schema
- Must-have fields exist: summary, risks
- Word budget <= max_words

Soft checks (0-2 points each):
- Coverage: mentions major changed areas
- Risk quality: risks are specific + mitigation suggested
- Faithfulness: no contradictions with tool outputs

模板 4:Tracing 字段(最小集合)

建议全链路统一这些字段(请求/模型/工具都带):

  • task_id:一次业务任务
  • run_id:一次执行实例(重试会变化)
  • step_id:步骤编号
  • skill_name / skill_version
  • tool_name / tool_call_id
  • user_id / tenant_id(如有)
  • ok / error_type
  • latency_ms
  • tokens_in / tokens_out(或 cost)
  • cache:hit/miss(如有)

起手式:从 0 到 1 做一个 Skill(最短路径)

  • 先写 SKILL.md:边界、契约、失败类型
  • 再做只读工具:接入真实数据
  • 再做结构化输出:稳定格式
  • 再做 10 条回归:保证迭代不漂
  • 最后补 tracing:上线可排障
← 返回博客列表