[Recipes] Bucket Aggregations - Terms Aggregation

Problem: 

Demo bucket aggregations - terms.

Solution Summary: 

Bucket aggregations create buckets of documents and put documents into those buckets based on some criteria. 

Prerequisites: 

Set up accounts index from accounts.json as explained in the link

Solution Steps: 

Case 1 - Terms Aggregation with Many Buckets

GET accounts/_search
{
  "aggs" : {
    "state_terms" : {
      "terms" : {
        "field":"state.keyword"
      }
    }
  },
  "size": 0
}

 

Response contains:

"aggregations": {
    "state_terms": {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets": [
        {
          "key": "ID",
          "doc_count": 27
        },
        {
          "key": "TX",
          "doc_count": 27
        },
        {
          "key": "AL",
          "doc_count": 25
        },
        {
          "key": "MD",
          "doc_count": 25
        },
        {
          "key": "TN",
          "doc_count": 23
        },
        {
          "key": "MA",
          "doc_count": 21
        },
        {
          "key": "NC",
          "doc_count": 21
        },
        {
          "key": "ND",
          "doc_count": 21
        },
        {
          "key": "ME",
          "doc_count": 20
        },
        {
          "key": "MO",
          "doc_count": 20
        }
      ]
    }
  }

 

Note:

  1. Elastic search only returns the top unique keys' buckets. Sum of other bucket docs are given as "sum_other_doc_count".

  2. The coordinating node coordinates among the shared in an index and sends the result for a query. The shards themselves will send data for top n rows based on configuration. Hence document counts may be approximate as explained here.  

    1. The value for "doc_count_error_upper_bound" represents the maximum potential document count for a term which did not make it into the final list of terms. This is calculated as the sum of the document count from the last term returned from each shard. 

    2. We can also enable per bucket document count error by setting show_term_doc_count_error parameter to true. With this setting every bucket will now have doc_count_error_upper_bound.

    3. However accuracy can be improved by compremising on performance. 

 

Case 2 - Terms Aggregation with Lesser Buckets

GET accounts/_search
{
  "aggs" : {
    "state_terms" : {
      "terms" : {
        "field":"opening_date"
      }
    }
  },
  "size": 0
}

Note: opening_date has only 5 distinct values and were added here.

Response contains:
"aggregations": {
    "state_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1514764800000,
          "key_as_string": "2018/01/01 00:00:00",
          "doc_count": 2
        },
        {
          "key": 1517702400000,
          "key_as_string": "2018/02/04 00:00:00",
          "doc_count": 2
        },
        {
          "key": 1520553600000,
          "key_as_string": "2018/03/09 00:00:00",
          "doc_count": 2
        },
        {
          "key": 1523836800000,
          "key_as_string": "2018/04/16 00:00:00",
          "doc_count": 2
        },
        {
          "key": 1527206400000,
          "key_as_string": "2018/05/25 00:00:00",
          "doc_count": 2
        }
      ]
    }
  }

 

Case 3 - Put missing values into a bucket with a default key

GET accounts/_search
{
  "aggs" : {
    "state_terms" : {
      "terms" : {
        "field":"opening_date",
        "missing": "2017/12/31"
      }
    }
  },
  "size": 0
}

 

Response contains:

"aggregations": {
    "state_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1514678400000,
          "key_as_string": "2017/12/31 00:00:00",
          "doc_count": 990
        },
        {
          "key": 1514764800000,
          "key_as_string": "2018/01/01 00:00:00",
          "doc_count": 2
        },
        {
          "key": 1517702400000,
          "key_as_string": "2018/02/04 00:00:00",
          "doc_count": 2
        },
        {
          "key": 1520553600000,
          "key_as_string": "2018/03/09 00:00:00",
          "doc_count": 2
        },
        {
          "key": 1523836800000,
          "key_as_string": "2018/04/16 00:00:00",
          "doc_count": 2
        },
        {
          "key": 1527206400000,
          "key_as_string": "2018/05/25 00:00:00",
          "doc_count": 2
        }
      ]
    }
  }

 

Case 4 - Set minimum count for buckets

GET accounts/_search
{
  "aggs" : {
    "state_terms" : {
      "terms" : {
        "field":"opening_date",
        "missing": "2017/12/31",
        "min_doc_count": 3
      }
    }
  },
  "size": 0
}

 

Response contains:

"aggregations": {
    "state_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1514678400000,
          "key_as_string": "2017/12/31 00:00:00",
          "doc_count": 990
        }
      ]
    }
  }

Note: Buckets with less than min_doc_count will not be returned. Default value for min_doc_count is 1. So by default, buckets with no documents are not returned (unless you set min_doc_count as 0).

 

TODO

  1. Add ordering to any one of the above bucket aggregations.

    1. order: { "_term" : "asc" }

  2. Try doing terms bucket aggregation across indexes.

Recipe Tags: 

Learn Serverless from Serverless Programming Cookbook

Contact

Please first use the contact form or facebook page messaging to connect.

Offline Contact
We currently connect locally for discussions and sessions at Bangalore, India. Please follow us on our facebook page for details.
WhatsApp (Primary): (+91) 7411174113
Phone (Escalations): (+91) 7411174114

Business newsletter

Complete the form below, and we'll send you an e-mail every now and again with all the latest news.

About

CloudMaterials is my blog to share notes and learning materials on Cloud and Data Analytics. My current focus is on Microsoft Azure and Amazon Web Services (AWS).

I like to write and I try to document what I learn to share with others. I believe that knowledge is useless unless you share it; the more you share, the more you learn.

Recent comments

Photo Stream