Improve this page

Bucket Aggregations

Bucket aggregations don’t calculate metrics over fields like the metrics aggregations do, but instead, they create buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines whether or not a document in the current context “falls” into it. In other words, the buckets effectively define document sets. In addition to the buckets themselves, the bucket aggregations also compute and return the number of documents that “fell into” each bucket.

Bucket aggregations, as opposed to metrics aggregations, can hold sub-aggregations. These sub-aggregations will be aggregated for the buckets created by their “parent” bucket aggregation.

There are different bucket aggregators, each with a different “bucketing” strategy. Some define a single bucket, some define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.

Available bucket aggregations:

Filter
Values
Terms
Date Histogram 🔧
Date Range 🔧
Geo-spatial Distance 🔧
Geo-spatial Trixels 🔧
Histogram
Missing value 🔧
Range
IP range 🔧
Geo-spatial IP 🔧

Unimplemented Features!
Some features haven’t yet been implemented…
Pull requests are welcome!

Structuring

The following snippet captures the structure of aggregations types for buckets:

"<aggregation_name>": {
  "<bucket_aggregation_type>": {
    ( "_sort": {  <sort_body> }, )?
    ( "_limit": <limit_count>, )?
    ( "_min_doc_count": <min_doc_count>, )?
    ( "_keyed": <keyed_boolean>, )?
    ...
  },
  ...
}

Ordering

The order of the buckets can be customized by setting a <sort_body> in the _sort setting. By default each bucket type has different ordering (e.g. Histogram Aggregation orders its returned buckets by their key ascending). It is possible to change this default behaviour as documented below:

Ordering the buckets by their document count in an ascending manner:

SEARCH /bank/
{
  "_query": "*",
  "_limit": 0,
  "_check_at_least": 1000,
  "_aggs": {
    "fruits": {
      "_values": {
        "_field": "favoriteFruit",
        "_sort": { "_doc_count": "asc" }
      }
    }
  }
}

Ordering the buckets alphabetically by their keys in an ascending manner:

SEARCH /bank/
{
  "_query": "*",
  "_limit": 0,
  "_check_at_least": 1000,
  "_aggs": {
    "fruits": {
      "_values": {
        "_field": "favoriteFruit",
        "_sort": { "_key": "asc" }
      }
    }
  }
}

Ordering by Sub Aggregations

Ordering the buckets by single value metrics sub-aggregation (identified by the aggregation name):

SEARCH /bank/
{
  "_query": "*",
  "_limit": 0,
  "_check_at_least": 1000,
  "_aggs": {
    "balance_by_state": {
      "_values": {
        "_field": "contact.state",
        "_sort": { "max_balance_count._max": "asc" }
      },
      "_aggs": {
        "max_balance_count": {
          "_max": {
            "_field": "balance"
          }
        }
      }
    }
  }
}

Ordering the buckets by multi value metrics sub-aggregation (identified by the aggregation name):

SEARCH /bank/
{
  "_query": "*",
  "_limit": 0,
  "_check_at_least": 1000,
  "_aggs": {
    "balance_by_state": {
      "_values": {
        "_field": "contact.state",
        "_sort": { "balance_stats._max": "asc" }
      },
      "_aggs": {
        "balance_stats": {
          "_stats": {
            "_field": "balance"
          }
        }
      }
    }
  }
}

Deep Ordering

Unimplemented Feature!
This feature hasn’t yet been implemented…
Pull requests are welcome!

It is also possible to order the buckets based on a “deeper” aggregation in the hierarchy. This is supported as long as the aggregations path are of a single-bucket type, where the last aggregation in the path may either be a single-bucket one or a metrics one. If it’s a single-bucket type, the order will be defined by the number of docs in the bucket (i.e. _doc_count), in case it’s a metrics one, the same rules as above apply (where the path must indicate the metric name to sort by in case of a multi-value metrics aggregation, and in case of a single-value metrics aggregation the sort will be applied on that value).

SEARCH /bank/
{
  "_query": "*",
  "_limit": 0,
  "_check_at_least": 1000,
  "_aggs": {
    "states": {
      "_values": {
        "_field": "contact.state",
        "_sort": { "cities.*.balance_stats._max": "asc" }
      },
      "_aggs": {
        "cities": {
          "_values": {
            "_field": "contact.city"
          },
          "_aggs": {
            "balance_stats": {
              "_stats": {
                "_field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

Limit

The maximum number of buckets allowed in a single response is not currently hard limited, but the default is 10,000 buckets. The <limit_count> in the _limit option is a positive integer number used for changing this default.

Response Format

By default, the buckets are returned as an ordered array. It is also possible to request the response as an object keyed by the buckets keys by using the _keyed boolean option:

SEARCH /bank/
{
  "_query": "*",
  "_limit": 0,
  "_check_at_least": 1000,
  "_aggs": {
    "balances": {
      "_histogram": {
        "_field": "balance",
        "_interval": 1000,
        "_keyed": true
      }
    }
  }
}

Response:

  "aggregations": {
    "_doc_count": 1000,
    "balances": {
      "0.0": {
        "_doc_count": 55
      },
      "1000.0": {
        "_doc_count": 329
      },
      "2000.0": {
        "_doc_count": 286
      },
      "3000.0": {
        "_doc_count": 294
      },
      "4000.0": {
        "_doc_count": 1
      },
      "5000.0": {
        "_doc_count": 1
      },
      "6000.0": {
        "_doc_count": 4
      },
      "7000.0": {
        "_doc_count": 12
      },
      "10000.0": {
        "_doc_count": 9
      },
      "12000.0": {
        "_doc_count": 1
      }
    }
  }, ...

Minimum Document Count

It is possible to only return terms that match more than a configured number of hits using the _min_doc_count option:

SEARCH /bank/
{
  "_query": "*",
  "_limit": 0,
  "_check_at_least": 1000,
  "_aggs": {
    "employers": {
      "_values": {
        "_field": "employer",
        "_min_doc_count": 5
      }
    }
  }
}

The above aggregation would only return tags which have been found in 5 hits or more. Default value is 1.

Filtering Values

Unimplemented Feature!
This feature hasn’t yet been implemented…
Pull requests are welcome!

It is possible to filter the values for which buckets will be created. This can be done using the include and exclude parameters which are based on regular expression strings or arrays of exact values. Additionally, include clauses can filter using partition expressions.

Collect Mode

Unimplemented Feature!
This feature hasn’t yet been implemented…
Pull requests are welcome!

Deferring calculation of child aggregations

For fields with many unique terms and a small number of required results it can be more efficient to delay the calculation of child aggregations until the top parent-level aggs have been pruned. Ordinarily, all branches of the aggregation tree are expanded in one depth-first pass and only then any pruning occurs. In some scenarios this can be very wasteful and can hit memory constraints.

Missing Value

Unimplemented Feature!
This feature hasn’t yet been implemented…
Pull requests are welcome!

The _missing parameter defines how documents that are missing a value should be treated. By default they will be ignored but it is also possible to treat them as if they had a value.

SEARCH /bank/
{
  "_query": "*",
  "_limit": 0,
  "_check_at_least": 1000,
  "_aggs": {
    "gender": {
      "_values": {
        "_field": "gender",
        "_missing": "N/A"
      }
    }
  }
}

Documents without a value in the gender field will fall into the same bucket as documents that have the value "N/A".

Sub Aggregations

Side-by-side the <bucket_aggregation_type>, an additional _aggregations object can be added, to nest other sub-aggregations.

The following example, not only “bucket” the documents to the different buckets, but also computes statistics over the ages of account holders in each balance range:

SEARCH /bank/
{
  "_query": "*",
  "_limit": 0,
  "_check_at_least": 1000,
  "_aggs": {
    "balances_by_range": {
      "_range": {
        "_field": "balance",
        "_keyed": true,
        "_ranges": [
          { "_key": "poor", "_to": 2000 },
          { "_key": "average", "_from": 2000, "_to": 4000 },
          { "_key": "rich", "_from": 4000 }
        ]
      },
      "_aggs": {
        "age_stats": {
          "_stats": {
            "_field": "age"
          }
        }
      }
    }
  }
}

Response:

{
  "aggregations": {
    "_doc_count": 1000,
    "balances_by_range": {
      "poor": {
        "_doc_count": 384,
        "age_stats": {
          "_count": 384,
          "_min": 17.0,
          "_max": 44.0,
          "_avg": 28.817708333333333,
          "_sum": 11066.0
        }
      },
      "average": {
        "_doc_count": 580,
        "age_stats": {
          "_count": 580,
          "_min": 17.0,
          "_max": 42.0,
          "_avg": 30.387931034482759,
          "_sum": 17625.0
        }
      },
      "rich": {
        "_doc_count": 36,
        "age_stats": {
          "_count": 36,
          "_min": 20.0,
          "_max": 44.0,
          "_avg": 37.52777777777778,
          "_sum": 1351.0
        }
      }
    }
  }, ...
}

Mixing field types

Warning
When aggregating on multiple indices the type of the aggregated field may not be the same in all indices. Some types are compatible with each other (positive integer and float) but when the types are a mix of decimal and non-decimal number the terms aggregation will promote the non-decimal numbers to decimal numbers. This can result in a loss of precision in the bucket values.

Back