AiHubMix Documentation Hub

提示词缓存（Prompt Caching）是一种用于降低模型推理成本的重要机制。通过缓存已经处理过的提示内容，在后续请求中重复利用，从而减少重复计算、降低费用并提升响应效率。

原理

当您发送启用了提示缓存的请求时，系统会检查提示前缀是否已从最近的查询中缓存。如果找到，则使用缓存，减少处理时间和成本；否则，处理完整提示并在响应开始后缓存前缀。这对以下场景特别有用：

包含大量示例的提示
大量上下文或背景信息
具有一致指令的重复任务
长时间的多轮对话

核心机制

不同模型供应商对缓存的支持方式不同：

自动缓存

自动缓存无需额外配置，系统自动识别并缓存可复用内容，适用于 OpenAI、DeepSeek 等模型。

OpenAI

最低提示长度：1024 tokens
价格：写缓存免费，读缓存 0.25x～0.5x 原价

Gemini

默认启用隐式上下文缓存，缓存自动生效，无需手动配置。
缓存仅在内容、模型、参数完全一致时生效；任何字段不同都会视为新请求，不命中缓存。
缓存有效期由开发者设定，也可以不设置。如果未指定，默认为 1 小时。无最小或最大时长限制，费用取决于缓存 token 数与缓存时间。

DeepSeek / Grok / Moonshot / Groq

价格：写缓存免费或同价，读缓存低于原价

Claude 模型显示缓存

需要通过 cache_control 手动指定缓存位置
可以精细控制缓存粒度
适用于 Anthropic Claude 模型

OpenAI 兼容接口

在 system、user（含图片）、tools 中均可通过 cache_control 字段设置缓存断点，以下示例仅展示关键结构： System 消息缓存（默认 5 分钟 TTL）：

{
  "model": "claude-opus-4-5",
  "messages": [
    {
      "role": "system",
      "content": [
        {"type": "text", "text": "You are an AI assistant"},
        {
          "type": "text",
          "text": "(long context)",
          "cache_control": {"type": "ephemeral"}
        }
      ]
    },
    {
      "role": "user",
      "content": [{"type": "text", "text": "Hello"}]
    }
  ]
}

User 消息缓存（1 小时 TTL）：

{
  "model": "claude-opus-4-5",
  "messages": [
    {
      "role": "system",
      "content": [{"type": "text", "text": "You are an AI assistant"}]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "(long context)",
          "cache_control": {"type": "ephemeral", "ttl": "1h"}
        },
        {"type": "text", "text": "Hello"}
      ]
    }
  ]
}

图片消息缓存：

{
  "role": "user",
  "content": [
    {
      "type": "image_url",
      "image_url": {"detail": "auto", "url": "data:image/jpeg;base64,/9j/4AAQ..."},
      "cache_control": {"type": "ephemeral"}
    },
    {"type": "text", "text": "What's this？"}
  ]
}

Tool 定义缓存： cache_control 放在 tool 对象的顶层（与 type、function 同级）：

{
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"]
      }
    },
    "cache_control": {"type": "ephemeral", "ttl": "1h"}
  }]
}

Anthropic 兼容接口

curl https://aihubmix.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $AIHUBMIX_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-6",
    "max_tokens": 1024,
    "system": [
      {
        "type": "text",
        "text": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.\n"
      },
      {
        "type": "text",
        "text": "<the entire contents of Pride and Prejudice>",
        "cache_control": {"type": "ephemeral"}
      }
    ],
    "messages": [
      {
        "role": "user",
        "content": "Analyze the major themes in Pride and Prejudice."
      }
    ]
  }'

# 使用相同的输入再次调用模型，直到缓存检查点
curl https://aihubmix.com/v1/messages # rest of input

缓存时间

默认： 5 分钟
可选：1 小时（“ttl”: “1h”）

需要了解更多信息请查看：Claude 提示词缓存

使用建议

保持前缀稳定

将固定内容放在 Prompt 前部，推荐结构：

[系统设定 / 长文本 / RAG数据] 
[用户问题（变化部分）]

缓存大文本

优先缓存一下内容：

RAG 数据
长文本
CSV / JSON 数据
角色设定

控制 TTL

短会话 → 5 分钟
长会话 → 1 小时（更省成本）

减少缓存写入

避免频繁变化的内容进入缓存，不要缓存时间戳、用户输入变量、高频变化数据等内容。

更新时间：2026-06-01

​原理

​核心机制

​自动缓存

​OpenAI

​Gemini

​DeepSeek / Grok / Moonshot / Groq

​Claude 模型显示缓存

​OpenAI 兼容接口

​Anthropic 兼容接口

​缓存时间

​使用建议

​避免频繁变化的内容进入缓存，不要缓存时间戳、用户输入变量、高频变化数据等内容。

原理