查询上下文 vs 过滤上下文，不要再用错

查询上下文 vs 过滤上下文，不要再用错

"我的 term 查询为什么搜不出数据？"

"为什么加了排序后查询变慢了？"

"bool 里面 must 和 filter 到底有什么区别？"

这三个问题的答案指向同一个概念：查询上下文（Query Context）和过滤上下文（Filter Context）的区别。 这是 ES 搜索中最基础、最重要、也最容易被忽视的概念。

一句话区别

	Query Context	Filter Context
核心问题	这个文档有多匹配？	这个文档匹不匹配？
评分	计算 `_score`	不计算（score = 0 或无关）
缓存	不缓存结果	LRU 自动缓存
性能	慢（需要算分）	快（可以走缓存）
适用场景	全文搜索、相关性排序	条件筛选、权限过滤

哪些查询走哪条路

Query Context（参与评分）

{ "match": { "title": "iPhone" } }          // 全文匹配
{ "match_phrase": { "title": "iPhone 15" } } // 短语匹配
{ "multi_match": { ... } }                   // 多字段匹配
{ "query_string": { ... } }                  // 查询字符串

Filter Context（不参与评分）

{ "term":  { "status": "published" } }       // 精确匹配
{ "terms": { "tag": ["java", "es"] } }       // 批量精确匹配
{ "range": { "price": { "gte": 1000 } } }    // 范围
{ "exists": { "field": "description" } }     // 字段存在性

关键：在 bool 里的位置决定了它的语境

GET /products/_search
{
  "query": {
    "bool": {
      "must": [     // ← Query Context（参与评分）
        { "match": { "title": "iPhone" } }
      ],
      "filter": [   // ← Filter Context（不参与评分，走缓存）
        { "term":  { "status": "published" } },
        { "range": { "price": { "gte": 5000, "lte": 15000 } } }
      ]
    }
  }
}

must 里的 match 参与评分——标题越匹配分数越高。filter 里的 term 和 range 只是二值判断：要么在，要么不在，不算分，但结果被缓存。

为什么这很重要：三个真实案例

案例 1：term 写在了 must 里——正确，但浪费

{
  "bool": {
    "must": [
      { "term": { "status": "published" } }
    ]
  }
}

功能正确，但 ES 会对每个匹配的文档计算 _score，而 term 的评分永远是 0 或 1——毫无意义。改为 filter 后走缓存，高并发下性能差距显著。

案例 2：match 写在了 filter 里——不报错，但不评分

{
  "bool": {
    "filter": [
      { "match": { "title": "iPhone" } }
    ]
  }
}

ES 允许 match 出现在 filter 里，但不会计算相关性分数。搜索结果全部返回，但按什么顺序？全是 0 分——不是你想要的结果。

案例 3：全文搜索 + 过滤条件混在 must 里——变慢

// ❌ 不好
{
  "bool": {
    "must": [
      { "match": { "title": "iPhone" } },
      { "term":  { "status": "published" } },
      { "range": { "price": { "gte": 5000 } } }
    ]
  }
}

// ✅ 好
{
  "bool": {
    "must": [
      { "match": { "title": "iPhone" } }
    ],
    "filter": [
      { "term":  { "status": "published" } },
      { "range": { "price": { "gte": 5000 } } }
    ]
  }
}

第一个查询每条文档都要对三个条件算分，第二个只对标题算分，其余走缓存过滤。10 万级文档量级下，速度差 3-5 倍。

constant_score：强行把 query 拧成 filter

有时候你需要用 match 来匹配（因为它比分词后交给 filter 更灵活），但不需要评分：

{
  "query": {
    "constant_score": {
      "filter": {
        "match": { "title": "iPhone" }
      }
    }
  }
}

constant_score 把内部查询的结果统一赋予固定分数，且走缓存。适合"我就是想搜出来这些文档，不用排序"的场景——比如自动补全、去重列表。

缓存是如何工作的

ES 的 filter 缓存（query cache）是以**段（segment）**为粒度的 LRU 缓存：

filter( status=published ) → segment 1: [doc1, doc2, doc5, doc9...]  → 缓存
                             segment 2: [doc12, doc15...]             → 缓存

当 segment 被 merge 或 refresh 后，对应的缓存失效。这就是为什么：

频繁 refresh 的索引 filter 缓存命中率低
不再写入的老索引 filter 缓存命中率接近 100%

# 查看缓存利用率
GET /_nodes/stats/indices/query_cache?pretty

实践 Checklist

全文搜索条件 → must
精确筛选（term/terms/range/exists）→ filter
bool 组合里，must 只放真正需要评分的条件
不需要排序的全文匹配 → constant_score + filter
_score 用于最终排序时，确保 must 里至少有一个会产生有意义分数的查询

总结

记住	忘记
filter 走缓存，must 不算缓存	filter 比 must 快（大部分场景 3-5 倍）
term/range 放在 filter 里	term 放在 must 里
match 放在 must 里	match 放在 filter 里（除非你不需要排序）

一句话：能 filter 的别用 must，能不用评分就别评分。每少算一个 _score，ES 就快一点。