Histogram Aggregation

A multi-bucket values source based aggregation that can be applied on numeric values extracted from the documents. It dynamically builds fixed size (a.k.a. interval) buckets over the values. For example, if the documents have a field that holds a balance (numeric), we can configure this aggregation to dynamically build buckets with interval 500 (in case of balance it may represent $500). When the aggregation executes, the balance field of every document will be evaluated and will be rounded down to its closest bucket - for example, if the balance is 3200 and the bucket size is 500 then the rounding will yield 3000 and thus the document will “fall” into the bucket that is associated with the key 3000. To make this more formal, here is the rounding function that is used:

bucket_key = floor((value - _shift) / _interval) * _interval + _shift;

Structuring

The following snippet captures the structure of histogram aggregations:

"<aggregation_name>": {
  "_histogram": {
    "_field": "<field_name>",
    "_interval": "<interval>",
    ( "_shift": <shift> )?
  },
  ...
}

Also supports all other functionality as explained in Bucket Aggregations.

Field

The <field_name> in the _field parameter defines the field on which the aggregation will act upon.

Interval

The _interval must be a positive decimal, while the _shift must be a decimal in [0, _interval) (a decimal greater than or equal to 0 and less than _interval)

Assuming the data consists of documents representing bank accounts, as shown in the sample dataset of Data Exploration section, the following snippet “buckets” the bank accounts based on their balance by interval of 500:

SEARCH /bank/
{
  "_query": "*",
  "_limit": 0,
  "_check_at_least": 1000,
  "_aggs": {
    "balances": {
      "_histogram": {
        "_field": "balance",
        "_interval": 1000
      }
    }
  }
}

And the following may be the response:

  "aggregations": {
    "_doc_count": 1000,
    "balances": [
      {
        "_doc_count": 55,
        "_key": "0.0"
      },
      {
        "_doc_count": 329,
        "_key": "1000.0"
      },
      {
        "_doc_count": 286,
        "_key": "2000.0"
      },
      {
        "_doc_count": 294,
        "_key": "3000.0"
      },
      {
        "_doc_count": 1,
        "_key": "4000.0"
      },
      {
        "_doc_count": 1,
        "_key": "5000.0"
      },
      {
        "_doc_count": 4,
        "_key": "6000.0"
      },
      {
        "_doc_count": 12,
        "_key": "7000.0"
      },
      {
        "_doc_count": 7,
        "_key": "8000.0"
      },
      {
        "_doc_count": 1,
        "_key": "9000.0"
      },
      {
        "_doc_count": 9,
        "_key": "10000.0"
      },
      {
        "_doc_count": 1,
        "_key": "12000.0"
      }
    ]
  }, ...

Shift

By default the bucket keys start with 0 and then continue in even spaced steps of interval, e.g. if the interval is 10 the first buckets (assuming there is data inside them) will be [0, 10), [10, 20), [20, 30). The bucket boundaries can be shifted by using the _shift option.

This can be best illustrated with an example. If there are many account holders with ages ranging from 17 to 44, using interval 10 will result in four buckets: [10, 20), [20, 30), [30, 40), [40, 50). If an additional _shift of 5 is used, however, there will be only three buckets to collect all the account holders: [15, 25), [25, 35), [35, 45):

SEARCH /bank/
{
  "_query": "*",
  "_limit": 0,
  "_check_at_least": 1000,
  "_aggs": {
    "ages": {
      "_histogram": {
        "_field": "age",
        "_interval": 10,
        "_shift": 5
      }
    }
  }
}

Response:

{
  "aggregations": {
    "_doc_count": 1000,
    "ages": [
      {
        "_doc_count": 243,
        "_key": "15"
      },
      {
        "_doc_count": 471,
        "_key": "25"
      },
      {
        "_doc_count": 286,
        "_key": "35"
      }
    ]
  }, ...
}

Ordering

By default, the returned buckets are sorted by their _key ascending, though the order behaviour can be controlled using the _sort setting. Supports the same order functionality as explained in Bucket Ordering.