[Recipes] Compound Bool Queries in Elasticsearch (Query and Filter Context)

Problem: 

Write Query DSL to combine multiple queries in a logical fashion using "bool". Write nested queries combining occurrence types (must, must_not). Write queries with a nested filter context. 

Solution Summary: 

Elasticsearch Query DSL consist of two types of clauses:

  1. Leaf query clauses

    1. Leaf query clauses look for a particular value in a particular field, such as the match, term or range queries. See previous recipes.

  2. Compound query clauses

    1. Compound query clauses wrap other leaf or compound queries and are used to combine multiple queries in a logical fashion (such as the bool or dis_max query), or to alter their behaviour (such as the constant_score query). We will see bool query here.

 

Bool Query
A bool query that matches documents matching boolean combinations of other queries. It is built using one or more boolean clauses, each clause with a typed occurrence. The occurrence types are: must, filter, should, must_not.

The bool query maps to Lucene BooleanQuery.

 

Nested Filter Context

By default query context considers relevance. However, for queries such as "range", relevance does not matter, and Elasticsearch gives a default relevance. So we can instead put them under filter context that does not consider relevance. Filter context results may also be cached by Elasticsearch giving a performance edge over query context.

Prerequisites: 

Set up accounts index from accounts.json as explained here.

Solution Steps: 

Case 1 - Return all accounts containing "mill" and "lane" in the address using bool

GET /accounts/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
        ]
    }
  }
}

Note:

  1. Replacing “must” with “must_not”, returns all accounts that contain neither "mill" nor "lane" in the address.

  2. Replacing “must” with “should” returns all accounts containing "mill" or "lane" in the address.

 

Case 2 – Nested Query: Anybody who is greater than 40 years old but doesn’t live in state ID

GET /accounts/_search
{
  "query": {
    "bool": {
      "must": [
        { 
          "range": { 
          "age": {
            "lte": "40" 
          }
          } 
        }
        ],
        "must_not": [
          { "match": { "state": "ID" } }
          ]
    }
  }
}

 

Case 3 - Bool query with a nested filter context (range query)

GET /accounts/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 25000,
            "lte": 30000
          }
        }
      }
    }
  }
}

Note:

  1. This query return all accounts with balances between 20000 and 30000, inclusive.

  2. Can substitute any other queries into the query and the filter parts.

  3. In addition to the match_all, match, bool, and range queries, there are a lot of other query types.

  4. Alternatively, we can simply use the range element directly within the query element.

Recipe Tags: 

Learn Serverless from Serverless Programming Cookbook

Contact

Please first use the contact form or facebook page messaging to connect.

Offline Contact
We currently connect locally for discussions and sessions at Bangalore, India. Please follow us on our facebook page for details.
WhatsApp (Primary): (+91) 7411174113
Phone (Escalations): (+91) 7411174114

Business newsletter

Complete the form below, and we'll send you an e-mail every now and again with all the latest news.

About

Cloudericks.com is my blog to share notes and learning materials on Cloud and Data Analytics. My current focus is on Amazon Web Services.

I like to write. I try to document what I learn and share with others. I believe that knowledge is useless unless you share it; the more you share, the more you learn.

Recent comments

Photo Stream