字段匹配 match

最简单的查询，它匹配所有文档。

GET /_search
{
    "query": {
        "match_all": {}
    }
}

Match Query 最基础的一个例子，"this is a test" 经过分析过程，去除停用词 ["is"、 "a"], 提取关键词 ["this"、 "test"]以默认的 or 关系针对 message 字段进行匹配查询。

GET /_search
{
    "query": {
        "match" : {
            "message" : "this is a test"
        }
    }
}

info

MatchQuery 接受 text/numerics/date 类型的字段
可以将 lenient 参数设置为 true 以忽略由数据类型不匹配引起的异常，例如尝试使用文本查询字符串查询数字字段。默认为false。

operator

以 and 来控制逻辑关系（默认为 or）做更精准的匹配。

GET /_search
{
    "query": {
        "match" : {
            "message" : {
                "query" : "this is a test",
                "operator" : "and"
            }
        }
    }
}

minimum_should_match

minimum_should_match 最小匹配数。配置参考

下例中关键词需匹配2次以上

GET /_search
{
    "query": {
        "match" : {
            "message" : {
                "query" : "hello world",
                "operator" : "or",
                "minimum_should_match": 2
            }
        }
    }
}

Fuzziness

fuzziness 允许不精确的模糊匹配。配置参考

默认情况下允许模糊转置 (ab → ba)，但可以通过将 fuzzy_transpositions 设置为 false 来禁用。

GET /_search
{
    "query": {
        "match" : {
            "message" : {
                "query" : "this is a testt",
                "fuzziness": "AUTO"
            }
        }
    }
}

info

注意，模糊匹配不适用于具有同义词的关键词

Zero terms query

如果使用的分析器（例如停用词过滤器）删除查询中的所有输入词，则默认行为是匹配不到任何文档。为了改变可以使用 zero_terms_query 选项，它接受 none（默认）并且 all 对应于 match_all 查询

GET /_search
{
    "query": {
        "match" : {
            "message" : {
                "query" : "to be or not to be",
                "operator" : "and",
                "zero_terms_query": "all"
            }
        }
    }
}

cutoff_frequency

cutoff_frequency 将查询字符串里的词项分为低频和高频两组。低频组（更重要的词项）组成大量查询条件，而高频组（次重要的词项）只会用来评分，而不参与匹配过程。通过对这两组词的区分处理，我们可以获取更高的检索性能提升。

以下面查询为例：

GET /_search
{
    "query": {
        "match" : {
            "message" : {
                "query" : "Shakespeare says, to be or not to be",
                "cutoff_frequency" : 0.01
            }
        }
    }
}

任何词项出现在文档中超过1%，被认为是高频词。上例中出现频率较多的停用词 to be or not to be 被划分为高频词，出现频率低的Shakespeare says为低频词此查询会被重写为以下的 bool 查询:

{
  "bool": {
    "must": {
      "bool": {
        "should": [
          { "term": { "text": "Shakespeare" }},
          { "term": { "text": "says"  }}
        ]
      }
    },
    "should": {
      "bool": {
        "should": [
          { "term": { "text": "to" }},
          { "term": { "text": "be" }},
          { "term": { "text": "or" }},
          { "term": { "text": "not" }}
        ]
      }
    }
  }
}

搜索更重要的低频词，而高频词只用于提高算分，以获取更好的结果相关度。

operator​

minimum_should_match​

Fuzziness​

Zero terms query​

cutoff_frequency​

operator

minimum_should_match

Fuzziness

Zero terms query

cutoff_frequency