Problem:
Demo bucket aggregations - terms.
Solution Summary:
Bucket aggregations create buckets of documents and put documents into those buckets based on some criteria.
Prerequisites:
Set up accounts index from accounts.json as explained in the link.
Solution Steps:
Case 1 - Terms Aggregation with Many Buckets
GET accounts/_search
{
"aggs" : {
"state_terms" : {
"terms" : {
"field":"state.keyword"
}
}
},
"size": 0
}
Response contains:
"aggregations": {
"state_terms": {
"doc_count_error_upper_bound": 20,
"sum_other_doc_count": 770,
"buckets": [
{
"key": "ID",
"doc_count": 27
},
{
"key": "TX",
"doc_count": 27
},
{
"key": "AL",
"doc_count": 25
},
{
"key": "MD",
"doc_count": 25
},
{
"key": "TN",
"doc_count": 23
},
{
"key": "MA",
"doc_count": 21
},
{
"key": "NC",
"doc_count": 21
},
{
"key": "ND",
"doc_count": 21
},
{
"key": "ME",
"doc_count": 20
},
{
"key": "MO",
"doc_count": 20
}
]
}
}
Note:
-
Elastic search only returns the top unique keys' buckets. Sum of other bucket docs are given as "sum_other_doc_count".
-
The coordinating node coordinates among the shared in an index and sends the result for a query. The shards themselves will send data for top n rows based on configuration. Hence document counts may be approximate as explained here.
-
The value for "doc_count_error_upper_bound" represents the maximum potential document count for a term which did not make it into the final list of terms. This is calculated as the sum of the document count from the last term returned from each shard.
-
We can also enable per bucket document count error by setting show_term_doc_count_error parameter to true. With this setting every bucket will now have doc_count_error_upper_bound.
-
However accuracy can be improved by compremising on performance.
-
Case 2 - Terms Aggregation with Lesser Buckets
GET accounts/_search
{
"aggs" : {
"state_terms" : {
"terms" : {
"field":"opening_date"
}
}
},
"size": 0
}
Note: opening_date has only 5 distinct values and were added here.
Response contains:
"aggregations": {
"state_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1514764800000,
"key_as_string": "2018/01/01 00:00:00",
"doc_count": 2
},
{
"key": 1517702400000,
"key_as_string": "2018/02/04 00:00:00",
"doc_count": 2
},
{
"key": 1520553600000,
"key_as_string": "2018/03/09 00:00:00",
"doc_count": 2
},
{
"key": 1523836800000,
"key_as_string": "2018/04/16 00:00:00",
"doc_count": 2
},
{
"key": 1527206400000,
"key_as_string": "2018/05/25 00:00:00",
"doc_count": 2
}
]
}
}
Case 3 - Put missing values into a bucket with a default key
GET accounts/_search
{
"aggs" : {
"state_terms" : {
"terms" : {
"field":"opening_date",
"missing": "2017/12/31"
}
}
},
"size": 0
}
Response contains:
"aggregations": {
"state_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1514678400000,
"key_as_string": "2017/12/31 00:00:00",
"doc_count": 990
},
{
"key": 1514764800000,
"key_as_string": "2018/01/01 00:00:00",
"doc_count": 2
},
{
"key": 1517702400000,
"key_as_string": "2018/02/04 00:00:00",
"doc_count": 2
},
{
"key": 1520553600000,
"key_as_string": "2018/03/09 00:00:00",
"doc_count": 2
},
{
"key": 1523836800000,
"key_as_string": "2018/04/16 00:00:00",
"doc_count": 2
},
{
"key": 1527206400000,
"key_as_string": "2018/05/25 00:00:00",
"doc_count": 2
}
]
}
}
Case 4 - Set minimum count for buckets
GET accounts/_search
{
"aggs" : {
"state_terms" : {
"terms" : {
"field":"opening_date",
"missing": "2017/12/31",
"min_doc_count": 3
}
}
},
"size": 0
}
Response contains:
"aggregations": {
"state_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1514678400000,
"key_as_string": "2017/12/31 00:00:00",
"doc_count": 990
}
]
}
}
Note: Buckets with less than min_doc_count will not be returned. Default value for min_doc_count is 1. So by default, buckets with no documents are not returned (unless you set min_doc_count as 0).
TODO
-
Add ordering to any one of the above bucket aggregations.
-
order: { "_term" : "asc" }
-
-
Try doing terms bucket aggregation across indexes.
Recent comments