Geospatial Support in ElasticSearch

1. Introduction

Elasticsearch is best known for its full-text search capabilities but it also features full geospatial support.

We can find more about setting up Elasticsearch and getting started in this previous article.

Let’s take a look on to how we can save geo-data in Elasticsearch and how we can search those data using geo queries.

2. Geo Data Type

To enable geo-queries, we need to create the mapping of the index manually and explicitly set the field mapping.

Dynamic mapping won’t work while setting mapping for geo types.

Elasticsearch offers two ways to represent geodata:

  1. Latitude-longitude pairs using geo-point field type

  2. Complex shape defined in GeoJSON using geo-shape field type

Let’s take a more in-depth look at each of the above categories:

2.1. Geo Point Data Type

Geo-point field type accepts latitude-longitude pairs that can be used to:

  • Find points within a certain distance of central point

  • Find points within a box or a polygon

  • Aggregate documents geographically or by distance from the central point

  • Sort documents by distance

Below is sample mapping for the field to save geo point data:

PUT /index_name
{
    "mappings": {
        "TYPE_NAME": {
            "properties": {
                "location": {
                    "type": "geo_point"
                }
            }
        }
    }
}

As we can see from above example, type for location field is geo_point. Thus, we can now provide latitude-longitude pair in the location in the location field.

2.2. Geo Shape Data Type

Unlike geo-point, geo shape provides the functionality to save and search complex shapes like polygon and rectangle. Geo shape data type must be used when we want to search documents which contains shapes other than geo points.

Let’s take a look at mapping for geo shape data type:

PUT /index_name
{
    "mappings": {
        "TYPE_NAME": {
            "properties": {
                "location": {
                    "type": "geo_shape",
                    "tree": "quadtree",
                    "precision": "1m"
                }
            }
        }
    }
}

The above mapping will index location field with quadtree implementation with a precision of one meter.

Elasticsearch breaks down the provided geo shape into series of geo hashes consisting of small grid-like squares called raster.

Depending on our requirement, we can control the indexing of geo shape fields. For example, when we’re searching documents for navigation, then precision up to one meter becomes very critical as it may lead to an incorrect path.

Whereas if we’re looking for some sightseeing places, a precision of up to 10-50 meters can be acceptable.

One thing that we need to keep in mind while indexing geo shape data is, we’re always compromising performance with accuracy. With higher precision, Elasticsearch generates more terms – which leads to increased memory usage. Hence we need to very cautious when selecting mapping for the geo shape.

We can find more mapping options for geo-shape data type at official ES site.

3. Different Ways to Save Geo Point Data


==== 3.1. Latitude Longitude Object

[source,java,gutter:,true]

PUT index_name/index_type/1
{
    "location": {
        "lat": 23.02,
        "lon": 72.57
    }
}

Here, geo-point location is saved as an object with latitude and longitude as keys.

3.2. Latitude Longitude Pair

[source,java,gutter:,true]

{
    "location": "23.02,72.57"
}

Here, location is expressed as a latitude-longitude pair in a plain string format. Please note, the sequence of latitude and longitude in string format.

3.3. Geo Hash

[source,java,gutter:,true]

{
    "location": "tsj4bys"
}

We can also provide geo point data in the form of geo hash as shown in the example above. We can use the online tool to convert latitude-longitude to geo hash.

3.4. Longitude Latitude Array

[source,java,gutter:,true]

{
    "location": [72.57, 23.02]
}

The sequence of latitude-longitude is reversed when latitude and longitude are supplied as an array. Initially, the latitude-longitude pair was used in both string and in an array, but later it was reversed in order to match the format used by GeoJSON.

4. Different Ways to Save Geo Shape Data


==== 4.1. Point

[source,java,gutter:,true]

POST /index/type
{
    "location" : {
        "type" : "point",
        "coordinates" : [72.57, 23.02]
    }
}

Here, the geo shape type that we’re trying to insert is a point. Please take a look at location field, we have nested object consisting of fields type and coordinates. These meta-fields helps Elasticsaerch in identifying the geo shape and its actual data.

4.2. LineString

[source,java,gutter:,true]

POST /index/type
{
    "location" : {
        "type" : "linestring",
        "coordinates" : [[Here, we're inserting _linestring_ geo shape. The coordinates for _linestring_ consists of two points i.e. start and endpoint. _LineString_ geo shape is very helpful for navigation use case.

[[polygon]]
==== *4.3. _Polygon_*

[[polygon]][source,java,gutter:,true]

POST /index/type
{
“location” : {
“type” : “polygon”,
“coordinates” : [
[ [10.0, 0.0], [11.0, 0.0], [11.0, 1.0], [10.0, 1.0], [10.0, 0.0] ]
]
}
}

Here, we're inserting _polygon_ geo shape. Please take a look at the _coordinates_ in above example, _first_ and _last_ coordinates in polygon should always match i.e a closed polygon.

*Elasticsearch also supports other GeoJSON structures as well. A complete list of other supported formats is as below:*

* *_MultiPoint_*
* *_MultiLineString_*
* *_MultiPolygon_*
* *_GeometryCollection_*
* *_Envelope_*
* *_Circle_*

We can find examples of above-supported formats on the official https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-shape.html#input-structure[ES site].

For all structures, the inner _type_ and _coordinates_ are mandatory fields. Also, sorting and retrieving geo shape fields are currently not possible in Elasticsearch due to their complex structure. Thus, the only way to retrieve geo fields is from the source field.

[[geo-query]]
=== *5. ElasticSearch Geo Query*

[[geo-query]]Now, that we know how to insert documents containing geo shapes, let's dive into fetching those records using geo shape queries. But before we start using Geo Queries, we'll need following maven dependencies to support Java API for Geo Queries:

[source,java,gutter:,true]

<dependency>
<groupId>org.locationtech.spatial4j</groupId>
<artifactId>spatial4j</artifactId>
<version>0.7</version>
</dependency>
</dependency>
<groupId>com.vividsolutions</groupId>
<artifactId>jts</artifactId>
<version>1.13</version>
<exclusions>
<exclusion>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
</exclusion>
</exclusions>
</dependency>

We can search for above dependencies in https://search.maven.org/classic/#search%7Cgav%7C1%7Cg%3A%22org.locationtech.spatial4j%22%20AND%20a%3A%22spatial4j%22[Maven Central repository] as well.

Elasticsearch supports different types of geo queries and they are as follow:

[[geo-shape-query]]
==== *5.1. Geo Shape Query*

[[geo-shape-query]]This requires the _geo_shape_ mapping.

Similar to _geo_shape_ type, _geo_shape_ uses GeoJSON structure to query documents.

Below is sample query to fetch all documents that fall _within_ given top-left and bottom-right coordinates:

[source,java,gutter:,true]

{
“query”:{
“bool”: {
“must”: {
“match_all”: {}
},
“filter”: {
“geo_shape”: {
“region”: {
“shape”: {
“type”: “envelope”,
“coordinates” : [[relation”: “within”
}
}
}
}
}
}

Here, _relation_ determines *spatial relation operators* used at search time.

Below is the list of supported operators:

* *_INTERSECTS_* – (default) returns all documents whose _geo_shape_ field intersects the query geometry
* *_DISJOINT_* – retrieves all documents whose _geo_shape_ field has nothing in common with the query geometry
* *_WITHIN_* – gets all documents whose _geo_shape_ field is within the query geometry
* *_CONTAINS_* – returns all documents whose _geo_shape_ field contains the query geometry

Similarly, we can query using different GeoJSON shapes.

Java code for above query is as below:

[source,java,gutter:,true]

QueryBuilders
.geoShapeQuery(“region”, ShapeBuilders.newEnvelope(
new Coordinate(75.00, 25.0),
new Coordinate(80.1, 30.2)))
.relation(ShapeRelation.WITHIN);

[[geo-bounding-box-query]]
==== *5.2. Geo Bounding Box Query*

[[geo-bounding-box-query]]Geo Bounding Box query is used to fetch all the documents based on point location. Below is a sample bounding box query:

[source,java,gutter:,true]

{
“query”: {
“bool” : {
“must” : {
“match_all” : {}
},
“filter” : {
“geo_bounding_box” : {
“location” : {
“bottom_left” : [28.3, 30.5],
“top_right” : [31.8, 32.12]
}
}
}
}
}
}

Java code for above bounding box query is as below:

[source,java,gutter:,true]

QueryBuilders
.geoBoundingBoxQuery(“location”).setCorners(31.8, 30.5, 28.3, 32.12);

Geo Bounding Box query supports similar formats like we have in _geo_point_ data type. Sample queries for supported formats can be found on the https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-bounding-box-query.html#_accepted_formats[official site].

[[geo-distance-query]]
==== *5.3. Geo Distance Query*

[[geo-distance-query]]Geo distance query is used to filter all documents that come with the specified range of the point.

Here's a sample _geo_distance_ query:

[source,java,gutter:,true]

{
“query”: {
“bool” : {
“must” : {
“match_all” : {}
},
“filter” : {
“geo_distance” : {
“distance” : “10miles”,
“location” : [31.131,29.976]
}
}
}
}
}

And here's the Java code for above query:

[source,java,gutter:,true]

QueryBuilders
.geoDistanceQuery(“location”)
.point(29.976, 31.131)
.distance(10, DistanceUnit.MILES);

Similar to _geo_point,_ geo distance query also supports multiple formats for passing location coordinates. More details on supported formats can be found at the https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-query.html#_accepted_formats_2[official site].

[[geo-polygon-query]]
==== *5.4. Geo _Polygon_ Query*

[[geo-polygon-query]]A query to filter all records that have points that fall within the given polygon of points.

Let's have a quick look at a sample query:

[source,java,gutter:,true]

{
“query”: {
“bool” : {
“must” : {
“match_all” : {}
},
“filter” : {
“geo_polygon” : {
“location” : {
“points” : [
{“lat” : 22.733, “lon” : 68.859},
{“lat” : 24.733, “lon” : 68.859},
{“lat” : 23, “lon” : 70.859}
]
}
}
}
}
}
}

And at the Java code for this query:

[source,java,gutter:,true]

List<GeoPoint> allPoints = new ArrayList<GeoPoint>();
allPoints.add(new GeoPoint(22.733, 68.859));
allPoints.add(new GeoPoint(24.733, 68.859));
allPoints.add(new GeoPoint(23, 70.859));

QueryBuilders.geoPolygonQuery(“location”, allPoints);

Geo Polygon Query also supports formats mentioned below:

* lat-long as an array: [lon, lat]
* lat-long as a string: “lat, lon”
* geo hash

_geo_point_ data type is mandatory in order to use this query.

=== *6. Conclusion*

[[conclusion]]In this article, we discussed different mapping options for indexing geo data i.e _geo_point_ and _geo_shape_.

We also went through different ways to store _geo-data_ and finally, we observed geo-queries and Java API to filter results using geo queries.

As always, the code is available https://github.com/eugenp/tutorials/tree/master/persistence-modules/spring-data-elasticsearch[in this GitHub project].

Leave a Reply

Your email address will not be published.