Guide to Elasticsearch in Java

1. Overview

In this article, we’re going to dive into some key concepts related to full-text search engines, with a special focus on Elasticsearch.

As this is a Java-oriented article, we’re not going to give a detailed step-by-step tutorial on how to setup Elasticsearch and show how it works under the hood, instead, we’re going to target the Java client, and how to use the main features like index, delete, get and search.

2. Setup

In order to install Elasticsearch on your machine, please refer to the official setup guide.

The installation process is pretty simple, just download the zip/tar package and run the elasticsearch script file(elasticsearch.bat for Windows users).

By default, Elasticsearch listens to the 9200 port for upcoming HTTP queries by default. We can verify that it is successfully launched by opening the http://localhost:9200/ URL in your favorite browser:

{
  "name" : "GEpcsab",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "z3FfAe6gRMmSmeWBIOihJg",
  "version" : {
    "number" : "5.6.10",
    "build_hash" : "b727a60",
    "build_date" : "2018-06-06T15:48:34.860Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.1"
  },
  "tagline" : "You Know, for Search"
}

3. Maven Configuration

Now that we have our basic Elasticsearch cluster up and running, let’s jump straight to the Java client. First of all, we need to have the following Maven dependency declared in our pom.xml file:

<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>5.6.0</version>
</dependency>

You can always check the latest versions hosted by the Maven Central with the link provided before.

4. Java API

Before we jump straight to how to use the main Java API features, we need to initiate the transport client:

Client client = new PreBuiltTransportClient(
  Settings.builder().put("client.transport.sniff", true)
                    .put("cluster.name","elasticsearch").build())
  .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));

4.1. Indexing Documents

The prepareIndex() function allows to store an arbitrary JSON document and make it searchable:

@Test
public void givenJsonString_whenJavaObject_thenIndexDocument() {
    String jsonObject = "{\"age\":10,\"dateOfBirth\":1471466076564,"
      +"\"fullName\":\"John Doe\"}";
    IndexResponse response = client.prepareIndex("people", "Doe")
      .setSource(jsonObject, XContentType.JSON).get();

    String id = response.getId();
    String index = response.getIndex();
    String type = response.getType();
    long version = response.getVersion();

    assertEquals(Result.CREATED, response.getResult());
    assertEquals(0, version);
    assertEquals("people", index);
    assertEquals("Doe", type);
}

When running the test make sure to declare the path.home variable, otherwise the following exception may rise:

java.lang.IllegalStateException: path.home is not configured

After running the Maven command : mvn clean install -Des.path.home=C:\elastic, the JSON document will be stored with people as an index and Doe as a type.

Note that it is possible to use any JSON Java library to create and process your documents. If you are not familiar with any of these, you can use Elasticsearch helpers to generate your own JSON documents:

XContentBuilder builder = XContentFactory.jsonBuilder()
  .startObject()
  .field("fullName", "Test")
  .field("dateOfBirth", new Date())
  .field("age", "10")
  .endObject();
IndexResponse response = client.prepareIndex("people", "Doe")
  .setSource(builder).get();

assertEquals(Result.CREATED, response.getResult());

4.2. Querying Indexed Documents

Now that we have a typed searchable JSON document indexed, we can proceed and search using the prepareSearch() method:

SearchResponse response = client.prepareSearch().execute().actionGet();
List<SearchHit> searchHits = Arrays.asList(response.getHits().getHits());
List<Person> results = new ArrayList<Person>();
searchHits.forEach(
  hit -> results.add(JSON.parseObject(hit.getSourceAsString(), Person.class)));

The results returned by the actionGet() method are called Hits, each Hit refers to a JSON document matching a search request.

In this case, the results list contains all the data stored in the cluster. Note that in this example we’re using the FastJson library in order to convert JSON Strings to Java objects.

We can enhance the request by adding additional parameters in order to customize the query using the QueryBuilders methods:

SearchResponse response = client.prepareSearch()
  .setTypes()
  .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
  .setPostFilter(QueryBuilders.rangeQuery("age").from(5).to(15))
  .execute()
  .actionGet();

4.3. Retrieving and Deleting Documents

The prepareGet() and prepareDelete() methods allow to get or delete a JSON document from the cluster using its id:

GetResponse response = client.prepareGet("people","Doe","1").get();
String age = (String) response.getField("age").getValue();
// Process other fields
DeleteResponse response = client.prepareDelete("people", "Doe", "5")
  .get();

The syntax is pretty straightforward, you just need to specify the index and the type value alongside the object’s id.

5. QueryBuilders Examples

The QueryBuilders class provides a variety of static methods used as dynamic matchers to find specific entries in the cluster. While using the prepareSearch() method for looking for specific JSON documents in the cluster, we can use query builders to customize the search results.

Here’s a list of the most common uses of the QueryBuilders API.

The matchAllQuery() method returns a QueryBuilder object that matches all documents in the cluster:

QueryBuilder matchAllQuery = QueryBuilders.matchAllQuery();

The rangeQuery() matches documents where a field’s value is within a certain range:

QueryBuilder matchDocumentsWithinRange = QueryBuilders
  .rangeQuery("price").from(15).to(100)

Providing a field name – e.g. fullName, and the corresponding value – e.g. John Doe, The matchQuery() method matches all document with these exact field’s value:

QueryBuilder matchSpecificFieldQuery= QueryBuilders
  .matchQuery("fullName", "John Doe");

We can as well use the multiMatchQuery() method to build a multi-fields version of the match query:

QueryBuilder matchSpecificFieldQuery= QueryBuilders.matchQuery(
  "Text I am looking for", "field_1", "field_2^3", "*_field_wildcard");

We can use the caret symbol (^) to boost specific fields.

In our example the field_2 has boost value set to three, making it more important than the other fields. Note that it’s possible to use wildcards and regex queries, but performance-wise, beware of memory consumption and response-time delay when dealing with wildcards, because something like *_apples may cause a huge impact on performance.

The coefficient of importance is used to order the result set of hits returned after executing the prepareSearch() method.

If you are more familiar with the Lucene queries syntax, you can use the simpleQueryStringQuery() method to customize search queries:

QueryBuilder simpleStringQuery = QueryBuilders
  .simpleQueryStringQuery("+John -Doe OR Janette");

As you can probably guess, we can use the Lucene’s Query Parser syntax to build simple, yet powerful queries. Here’re some basic operators that can be used alongside the AND/OR/NOT operators to build search queries:

  • The required operator (+): requires that a specific piece of text exists somewhere in fields of a document.

  • The prohibit operator (): excludes all documents that contain a keyword declared after the () symbol.

6. Conclusion

In this quick article, we’ve seen how to use the ElasticSearch’s Java API to perform some of the common features related to full-text search engines.

You can check out the example provided in this article in the GitHub project.

Leave a Reply

Your email address will not be published.