[Recipes] Elasticsearch Aggregations with Query and Filter

Problem: 

Combine Elasticsearch aggregations with query and filter. 

Solution Summary: 

We can use queries and filters with Elasticsearch aggregations so that aggregations will then run only for the query or filter context. Elastic search implicitely uses a match_all query.

Prerequisites: 

Set up accounts index from accounts.json as explained in the link. Make sure you add opening_date field as given in the link.

Solution Steps: 

Case 1 - Metric Aggregation with Query

GET accounts/_search
{
  "query" : {
    "exists": {
      "field": "opening_date"
    }
  },
  "aggs" : {
    "with_opening_date_bal_sum" : {
      "sum" : {
        "field" : "balance"
      }
    }
  },
  "size": 0
}

Note: If you use "filter" in the top level replacing "query", you will get exception.

 

Result:

{
  "took": 17,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 10,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "with_opening_date_bal_sum_query": {
      "value": 334631
    }
  }
}

Note: "took" is the time in milliseconds for Elasticsearch to execute the search. It varied between 4 to 17 in various runs.

 

Case 2 - Metric Aggregation with Filter

GET accounts/_search
{
  "aggs" : {
    "with_opening_date" : {
      "filter" : {
        "exists": {
          "field": "opening_date"
        }
      },
      "aggs": {
        "with_opening_date_bal_sum": {
          "sum": {
            "field":"balance"
          }
        }
      }
    }
  },
  "size": 0
}

Note: If you use "query" replacing "filter", you will get exception.

 

Response:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "with_opening_date": {
      "doc_count": 10,
      "with_opening_date_bal_sum": {
        "value": 334631
      }
    }
  }
}

 

Case 3 - Nested Bucket Aggregation with Query

GET accounts/_search
{
  "query" : {
    "exists": {
      "field": "opening_date"
    }
  },
  "aggs" : {
    "opening_date_terms" : {
      "terms" : {
        "field":"opening_date",
        "missing": "2017/12/31"
      },
      "aggs": {
        "opening_date_stats": {
          "stats": {
            "field":"opening_date"
          }
        }
      }
    }
  },
  "size": 0
}

 

Response contains:

...

"aggregations": {
    "opening_date_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1514764800000,
          "key_as_string": "2018/01/01 00:00:00",
          "doc_count": 2,
          "opening_date_stats": {
            "count": 2,
            "min": 1514764800000,
            "max": 1514764800000,
            "avg": 1514764800000,
            "sum": 3029529600000,
            "min_as_string": "2018/01/01 00:00:00",
            "max_as_string": "2018/01/01 00:00:00",
            "avg_as_string": "2018/01/01 00:00:00",
            "sum_as_string": "2066/01/01 00:00:00"
          }
        },

...

 

Case 4 - Nested Bucket Aggregation with Filter

GET accounts/_search
{
  "aggs" : {
    "with_opening_date" : {
      "filter" : {
        "exists": {
          "field": "opening_date"
        }
      },
      "aggs": {
        "opening_date_terms": {
          "terms": {
            "field":"opening_date",
            "missing": "2017/12/31"
          },
          "aggs": {
            "opening_date_stats": {
              "stats": {
                "field":"opening_date"
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}

 

Response contains:

...

"aggregations": {
    "with_opening_date": {
      "doc_count": 10,
      "opening_date_terms": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 1514764800000,
            "key_as_string": "2018/01/01 00:00:00",
            "doc_count": 2,
            "opening_date_stats": {
              "count": 2,
              "min": 1514764800000,
              "max": 1514764800000,
              "avg": 1514764800000,
              "sum": 3029529600000,
              "min_as_string": "2018/01/01 00:00:00",
              "max_as_string": "2018/01/01 00:00:00",
              "avg_as_string": "2018/01/01 00:00:00",
              "sum_as_string": "2066/01/01 00:00:00"
            }
          },

...

Recipe Tags: 

Learn Serverless from Serverless Programming Cookbook

Contact

Please first use the contact form or facebook page messaging to connect.

Offline Contact
We currently connect locally for discussions and sessions at Bangalore, India. Please follow us on our facebook page for details.
WhatsApp (Primary): (+91) 7411174113
Phone (Escalations): (+91) 7411174114

Business newsletter

Complete the form below, and we'll send you an e-mail every now and again with all the latest news.

About

CloudMaterials is my blog to share notes and learning materials on Cloud and Data Analytics. My current focus is on Microsoft Azure and Amazon Web Services (AWS).

I like to write and I try to document what I learn to share with others. I believe that knowledge is useless unless you share it; the more you share, the more you learn.

Recent comments

Photo Stream