GeoHex grid aggregations
The H3 grid system works well for proximity applications because it overcomes the limitations of Geohash’s non-uniform partitions. Geohash encodes latitude and longitude pairs, leading to significantly smaller partitions near the poles and a degree of longitude near the equator. However, the H3 grid system’s distortions are low and limited to 5 partitions of 122. These five partitions are placed in low-use areas (for example, in the middle of the ocean), leaving the essential areas error free. Thus, grouping documents based on the H3 grid system provides a better aggregation than the Geohash grid.
The GeoHex grid aggregation groups geopoints into grid cells for geographical analysis. Each grid cell corresponds to an and is identified using the H3Index representation.
The parameter controls the level of granularity that determines the grid cell size. The lower the precision, the larger the grid cells.
The following example illustrates low-precision and high-precision aggregation requests.
To start, create an index and map the location
field as a geo_point
:
PUT national_parks/_doc/1
{
"name": "Yellowstone National Park",
"location": "44.42, -110.59"
}
PUT national_parks/_doc/2
{
"name": "Yosemite National Park",
"location": "37.87, -119.53"
}
PUT national_parks/_doc/3
{
"name": "Death Valley National Park",
"location": "36.53, -116.93"
}
You can index geopoints in several formats. For a list of all supported formats, see the geopoint documentation.
Run a low-precision request that buckets all three documents together:
GET national_parks/_search
{
"aggregations": {
"grouped": {
"geohex_grid": {
"field": "location",
"precision": 1
}
}
}
}
You can use either the GET
or POST
HTTP method for GeoHex grid aggregation queries.
The response groups documents 2 and 3 together because they are close enough to be bucketed in one grid cell:
Now run a high-precision request:
GET national_parks/_search
{
"aggregations": {
"grouped": {
"geohex_grid": {
"field": "location",
"precision": 6
}
}
}
}
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"hits" : [
{
"_index" : "national_parks",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "Yellowstone National Park",
"location" : "44.42, -110.59"
}
},
{
"_index" : "national_parks",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "Yosemite National Park",
"location" : "37.87, -119.53"
}
},
{
"_index" : "national_parks",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "Death Valley National Park",
"location" : "36.53, -116.93"
}
}
]
},
"aggregations" : {
"grouped" : {
"buckets" : [
{
"key" : "8629ab6dfffffff",
"doc_count" : 1
},
{
"key" : "8629857a7ffffff",
"doc_count" : 1
},
{
"key" : "862896017ffffff",
"doc_count" : 1
}
]
}
}
High-precision requests are resource intensive, so we recommend using a filter like geo_bounding_box
to limit the geographical area. For example, the following query applies a filter to limit the search area:
The response contains the two documents that are within the geo_bounding_box
bounds:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"filtered" : {
"doc_count" : 2,
"grouped" : {
"buckets" : [
{
"key" : "8629ab6dfffffff",
"doc_count" : 1
},
{
"key" : "8629857a7ffffff",
"doc_count" : 1
}
]
}
}
}
}
You can also restrict the geographical area by providing the coordinates of the bounding envelope in the bounds
parameter. Both bounds
and geo_bounding_box
coordinates can be specified in any of the geopoint formats. The following query uses the well-known text (WKT) “POINT(longitude
latitude
)” format for the bounds
parameter:
GET national_parks/_search
{
"size": 0,
"aggregations": {
"grouped": {
"geohex_grid": {
"field": "location",
"precision": 6,
"bounds": {
"top_left": "POINT (-120 38)",
"bottom_right": "POINT (-116 36)"
}
}
}
}
The response contains only the two results that are within the specified bounds:
The bounds
parameter can be used with or without the geo_bounding_box
filter; these two parameters are independent and can have any spatial relationship to each other.