[Recipes] Elasticsearch Histogram and Date Histogram

Problem: 

Divide the entire range of values into a series of intervals.

Solution Summary: 

A histogram divide the entire range of values into a series of intervals. 

We can use "min_doc_count" to specify minimum number documents that needs to be present in each bucket.

We can use "extended_bounds" and use its "min" and "match" properties to set a lower and upper limit. You will also need to set "min_doc_count" to 0, to see buckets with no values.

Date Histogram is similar to Histogram but we specify an expression for the "interval" with values: year, quarter, month, week, day, hour, minute, second.

Prerequisites: 

Set up accounts index from accounts.json as explained in the link

Solution Steps: 

Case 1 - Simple Histogram For Age

GET accounts/_search
{
  "aggs": {
    "age_distribution": {
      "histogram": {
        "field": "age",
        "interval": 10
      }
    }
  },
  "size": 0
}

 

Response contains:

...

"aggregations": {
    "age_distribution": {
      "buckets": [
        {
          "key": 20,
          "doc_count": 451
        },
        {
          "key": 30,
          "doc_count": 504
        },
        {
          "key": 40,
          "doc_count": 45
        }
      ]
    }
  }

 

Case 2 - Do not return buckets with less than 100 records

We will use "min_doc_count" for setting minimum document counts in a bucket. We will also use an interval of 2.

GET accounts/_search
{
  "aggs": {
    "age_distribution": {
      "histogram": {
        "field": "age",
        "interval": 2,
        "min_doc_count": 100
      }
    }
  },
  "size": 0
}

 

Response contains:

...

"aggregations": {
    "age_distribution": {
      "buckets": [
        {
          "key": 30,
          "doc_count": 108
        },
        {
          "key": 32,
          "doc_count": 102
        },
        {
          "key": 34,
          "doc_count": 101
        }
      ]
    }
  }

 

Case 3 - Extend Boundaries

Will use "extended_bounds" and also set  "min_doc_count" as 0 to see results.

GET accounts/_search
{
  "aggs": {
    "age_distribution": {
      "histogram": {
        "field": "age",
        "interval": 10,
        "min_doc_count": 0,
        "extended_bounds": {
          "min": 10,
          "max": 50
        }
      }
    }
  },
  "size": 0
}

 

Response contains:

...

"aggregations": {
    "age_distribution": {
      "buckets": [
        {
          "key": 10,
          "doc_count": 0
        },
        {
          "key": 20,
          "doc_count": 451
        },
        {
          "key": 30,
          "doc_count": 504
        },
        {
          "key": 40,
          "doc_count": 45
        },
        {
          "key": 50,
          "doc_count": 0
        }
      ]
    }
  }

 

Case 4 - Date Histogram

GET accounts/_search
{
  "aggs": {
    "opening_date_distribution": {
      "date_histogram": {
        "field": "opening_date",
        "interval": "quarter"
      }
    }
  },
  "size": 0
}

 

Response Contains:

"aggregations": {
    "opening_date_distribution": {
      "buckets": [
        {
          "key_as_string": "2018/01/01 00:00:00",
          "key": 1514764800000,
          "doc_count": 6
        },
        {
          "key_as_string": "2018/04/01 00:00:00",
          "key": 1522540800000,
          "doc_count": 4
        }
      ]
    }
  }

 

TODO

  1. Explore the use of "offset" with histogram and date_histogram. 

Recipe Tags: 

Learn Serverless from Serverless Programming Cookbook

Contact

Please first use the contact form or facebook page messaging to connect.

Offline Contact
We currently connect locally for discussions and sessions at Bangalore, India. Please follow us on our facebook page for details.
WhatsApp (Primary): (+91) 7411174113
Phone (Escalations): (+91) 7411174114

Business newsletter

Complete the form below, and we'll send you an e-mail every now and again with all the latest news.

About

CloudMaterials is my blog to share notes and learning materials on Cloud and Data Analytics. My current focus is on Microsoft Azure and Amazon Web Services (AWS).

I like to write and I try to document what I learn to share with others. I believe that knowledge is useless unless you share it; the more you share, the more you learn.

Recent comments

Photo Stream