hibernate-common-performance-problems-in-logs

Posted on: 2019-10-08 2019-10-08
Tags: Hibernate, Logging, Persistence

3 Common Hibernate Performance Issues and How to Find Them in Your Log File

1. Introduction

You’ve probably read some of the complaints about bad Hibernate
performance or maybe you’ve struggled with some of them yourself. I have
been using Hibernate for more than 15 years now and I have run into more
than enough of these issues.

Over the years, I’ve learned that these problems can be avoided and that
you can find a lot of them in your log file. In this post, I want to
show you how you can find and fix 3 of them.

2. Find and Fix Performance Problems

2.1. Log SQL Statements in Production

The first performance issue is extremely easy to spot and often ignored.
It’s the logging of SQL statements in a production environment.

Writing some log statements doesn’t sound like a big deal, and there are
a lot of applications out there which do exactly that. But it is
extremely inefficient, especially via System.out.println as Hibernate
does it if you set the show_sql parameter in your Hibernate
configuration to true:

Hibernate: select
    order0_.id as id1_2_,
    order0_.orderNumber as orderNum2_2_,
    order0_.version as version3_2_
  from purchaseOrder order0_
Hibernate: select
    items0_.order_id as order_id4_0_0_,
    items0_.id as id1_0_0_,
    items0_.id as id1_0_1_,
    items0_.order_id as order_id4_0_1_,
    items0_.product_id as product_5_0_1_,
    items0_.quantity as quantity2_0_1_,
    items0_.version as version3_0_1_
  from OrderItem items0_
  where items0_.order_id=?
Hibernate: select
    items0_.order_id as order_id4_0_0_,
    items0_.id as id1_0_0_,
    items0_.id as id1_0_1_,
    items0_.order_id as order_id4_0_1_,
    items0_.product_id as product_5_0_1_,
    items0_.quantity as quantity2_0_1_,
    items0_.version as version3_0_1_
  from OrderItem items0_
  where items0_.order_id=?
Hibernate: select
    items0_.order_id as order_id4_0_0_,
    items0_.id as id1_0_0_,
    items0_.id as id1_0_1_,
    items0_.order_id as order_id4_0_1_,
    items0_.product_id as product_5_0_1_,
    items0_.quantity as quantity2_0_1_,
    items0_.version as version3_0_1_
  from OrderItem items0_
  where items0_.order_id=?

In one of my projects, I improved the performance by 20% within a few
minutes by setting show_sql to false. That’s the kind of achievement
you like to report in the next stand-up meeting ?

It’s pretty obvious how you can fix this performance issue. Just open
your configuration (e.g. your persistence.xml file) and set the
show_sql parameter to false. You don’t need this information in
production anyways.

But you might need them during development. If you don’t, you use 2
different Hibernate configurations (which you shouldn’t) you deactivated
the SQL statement logging there as well. The solution for that is to use
2 different log
configurations for development and production which are optimized for
the specific requirements of the runtime environment.

Development Configuration

The development configuration should provide as many useful information
as possible so that you can see how Hibernate interacts with the
database. You should therefore at least log the generated SQL statements
in your development configuration. You can do this by activating DEBUG
message for the org.hibernate.SQL category. If you also want to see
the values of your bind parameters, you have to set the log level of
org.hibernate.type.descriptor.sql to TRACE:

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p [%c] - %m%n
log4j.rootLogger=info, stdout

# basic log level for all messages
log4j.logger.org.hibernate=info

# SQL statements and parameters
log4j.logger.org.hibernate.SQL=debug
log4j.logger.org.hibernate.type.descriptor.sql=trace

The following code snippet shows some example log messages which
Hibernate writes with this log configuration. As you can see, you get
detailed information about the executed SQL query and all set and
retrieved parameter values:

23:03:22,246 DEBUG SQL:92 - select
    order0_.id as id1_2_,
    order0_.orderNumber as orderNum2_2_,
    order0_.version as version3_2_
  from purchaseOrder order0_
  where order0_.id=1
23:03:22,254 TRACE BasicExtractor:61 - extracted value ([id1_2_] : [BIGINT]) - [1]
23:03:22,261 TRACE BasicExtractor:61 - extracted value ([orderNum2_2_] : [VARCHAR]) - [order1]
23:03:22,263 TRACE BasicExtractor:61 - extracted value ([version3_2_] : [INTEGER]) - [0]

Hibernate provides you with a lot more internal information about a
Session if you activate the Hibernate statistics. You can do this by
setting the system property hibernate.generate_statistics to true.

But please, only activate the statistics on your development or test
environment. Collecting all these information slows down your
application and you might create your performance problems yourself if
you activate it this in production.

You can see some example statistics in the following code snippet:

23:04:12,123 INFO StatisticalLoggingSessionEventListener:258 - Session Metrics {
 23793 nanoseconds spent acquiring 1 JDBC connections;
 0 nanoseconds spent releasing 0 JDBC connections;
 394686 nanoseconds spent preparing 4 JDBC statements;
 2528603 nanoseconds spent executing 4 JDBC statements;
 0 nanoseconds spent executing 0 JDBC batches;
 0 nanoseconds spent performing 0 L2C puts;
 0 nanoseconds spent performing 0 L2C hits;
 0 nanoseconds spent performing 0 L2C misses;
 9700599 nanoseconds spent executing 1 flushes (flushing a total of 9 entities and 3 collections);
 42921 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
}

I regularly use these statistics in my daily work to
find
performance issues before they occur in production and I could write
several posts just about that. So let’s just focus on the most important
ones.

The lines 2 to 5 show you how many JDBC connections and statements
Hibernate used during this session and how much time it spent on it. You
should always have a look at these values and compare them with your
expectations.

If there are a lot more statements than you expected, you most likely
have the
most
common performance problem, the n+1 select issue. You can find it in
nearly all applications, and it might create huge performance issues on
a bigger database. I explain this issue in more details in the next
section.

The lines 7 to 9 show how Hibernate interacted with the 2nd level cache.
This is one of
Hibernate’s
3 caches, and it stores entities in a session independent way. If you
use the 2nd level in your application, you should always monitor these
statistics to see if Hibernate gets the entities from there.

Production Configuration

The production configuration should be optimized for performance and
avoid any messages that are not urgently required. In general, that
means that you should only log error messages. If you use Log4j, you can
achieve that with the following configuration:

If you use Log4j, you can achieve that with the following configuration:

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p [%c] - %m%n
log4j.rootLogger=info, stdout

# basic log level for all messages
log4j.logger.org.hibernate=error

2.2. N+1 Select Issue

As I already explained, the n+1 select issue is the most common
performance problem. A lot of developers blame the OR-Mapping concept
for this issue, and they’re not completely wrong. But you can easily
avoid it if you understand how Hibernate treats lazily fetched
relationships. The developer is, therefore, to blame as well because
it’s his responsibility to avoid these kinds of issues. So let me
explain first why this issue exists and then show you an easy way to
prevent it. If you are already familiar with the n+1 select issues, you
can jump directly to the solution.

Hibernate provides a very convenient mapping for relationships between
entities. You just need an attribute with the type of the related entity
and a few annotations to define it:

@Entity
@Table(name = "purchaseOrder")
public class Order implements Serializable {

    @OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
    private Set<OrderItem> items = new HashSet<OrderItem>();

    ...
}

When you now load an Order entity from the database, you just need to
call the getItems() method to get all items of this order. Hibernate
hides the required database queries to get the related OrderItem
entities from the database.

When you started with Hibernate, you probably learned that you should
use FetchType.LAZY for most of the relationships and that it’s the
default for to-many relationships. This tells Hibernate to only fetch
the related entities if you use the attribute which maps the
relationship. Fetching only the data you need is a good thing in
general, but it also requires Hibernate to execute an additional query
to initialize each relationship. This can result in a huge number of
queries, if you work on a list of entities, like I do in the following
code snippet:

List<Order> orders = em.createQuery("SELECT o FROM Order o").getResultList();
for (Order order : orders) {
    log.info("Order: " + order.getOrderNumber());
    log.info("Number of items: " + order.getItems().size());
}

You probably wouldn’t expect that this few lines of code can create
hundreds or even thousands of database queries. But it does if you use
FetchType.LAZY for the relationship to the OrderItem entity:

22:47:30,065 DEBUG SQL:92 - select
    order0_.id as id1_2_,
    order0_.orderNumber as orderNum2_2_,
    order0_.version as version3_2_
  from purchaseOrder order0_
22:47:30,136 INFO NamedEntityGraphTest:58 - Order: order1
22:47:30,140 DEBUG SQL:92 - select
    items0_.order_id as order_id4_0_0_,
    items0_.id as id1_0_0_,
    items0_.id as id1_0_1_,
    items0_.order_id as order_id4_0_1_,
    items0_.product_id as product_5_0_1_,
    items0_.quantity as quantity2_0_1_,
    items0_.version as version3_0_1_
  from OrderItem items0_
  where items0_.order_id=?
22:47:30,171 INFO NamedEntityGraphTest:59 - Number of items: 2
22:47:30,171 INFO NamedEntityGraphTest:58 - Order: order2
22:47:30,172 DEBUG SQL:92 - select
    items0_.order_id as order_id4_0_0_,
    items0_.id as id1_0_0_,
    items0_.id as id1_0_1_,
    items0_.order_id as order_id4_0_1_,
    items0_.product_id as product_5_0_1_,
    items0_.quantity as quantity2_0_1_,
    items0_.version as version3_0_1_
  from OrderItem items0_
  where items0_.order_id=?
22:47:30,174 INFO NamedEntityGraphTest:59 - Number of items: 2
22:47:30,174 INFO NamedEntityGraphTest:58 - Order: order3
22:47:30,174 DEBUG SQL:92 - select
    items0_.order_id as order_id4_0_0_,
    items0_.id as id1_0_0_,
    items0_.id as id1_0_1_,
    items0_.order_id as order_id4_0_1_,
    items0_.product_id as product_5_0_1_,
    items0_.quantity as quantity2_0_1_,
    items0_.version as version3_0_1_
  from OrderItem items0_
  where items0_.order_id=?
22:47:30,176 INFO NamedEntityGraphTest:59 - Number of items: 2

Hibernate performs one query to get all Order entities and an
additional one for each of the n Order entities to initialize the
orderItem relationship. So you now know why this kind of issue is
called n+1 select issue and why it can create huge performance problems.

What makes it even worse is, that you often don’t recognize it on a
small test database, if you haven’t checked your Hibernate statistics.
The code snippet requires only a few dozen queries if the test database
doesn’t contain a lot of orders. But that will be completely different
if you use your productive database which contains several thousand of
them.

I said earlier that you can easily avoid these issues. And that’s true.
You just have to initialize the orderItem relationship when you select
the Order entities from the database.

But please, only do that, if you use the relationship in your business
code and don’t use FetchType.EAGER to always fetch the related
entities. That just replaces your n+1 issue with another performance
problem.

Initialize a Relationships with a @NamedEntityGraph

There are several different
options
to initialize relationships. I prefer to use a @NamedEntityGraph
which is is one of my favorite
features
introduced in JPA 2.1. It provides a query independent way to specify a
graph of entities which Hibernate shall fetch from the database. In
following code snippet, you can see an example of a simple graph that
lets Hibernate eagerly fetch the items attribute of an entity:

@Entity
@Table(name = "purchase_order")
@NamedEntityGraph(
  name = "graph.Order.items",
  attributeNodes = @NamedAttributeNode("items"))
public class Order implements Serializable {

    ...
}

There isn’t much you need to do to define an entity graph with a
@NamedEntityGraph annotation. You just have to provide a unique name
for the graph and one @NamedAttributeNode annotation for each
attribute Hibernate shall fetch eagerly. In this example, it’s only the
items attribute which maps the relationship between an Order and
several OrderItem entities.

Now you can use the entity graph to control the fetching behaviour or a
specific query. You, therefore, have to instantiate an EntityGraph
based on the @NamedEntityGraph definition and provide it as a hint to
the EntityManager.find() method or your query. I do this in the
following code snippet where I select the Order entity with id 1 from
the database:

EntityGraph graph = this.em.getEntityGraph("graph.Order.items");

Map hints = new HashMap();
hints.put("javax.persistence.fetchgraph", graph);

return this.em.find(Order.class, 1L, hints);

Hibernate uses this information to create one SQL statement which gets
the attributes of the Order entity and the attributes of the entity
graph from the database:

17:34:51,310 DEBUG [org.hibernate.loader.plan.build.spi.LoadPlanTreePrinter] (pool-2-thread-1)
  LoadPlan(entity=blog.thoughts.on.java.jpa21.entity.graph.model.Order)
    - Returns
      - EntityReturnImpl(
          entity=blog.thoughts.on.java.jpa21.entity.graph.model.Order,
          querySpaceUid=<gen:0>,
          path=blog.thoughts.on.java.jpa21.entity.graph.model.Order)
        - CollectionAttributeFetchImpl(
            collection=blog.thoughts.on.java.jpa21.entity.graph.model.Order.items,
            querySpaceUid=<gen:1>,
            path=blog.thoughts.on.java.jpa21.entity.graph.model.Order.items)
          - (collection element) CollectionFetchableElementEntityGraph(
              entity=blog.thoughts.on.java.jpa21.entity.graph.model.OrderItem,
              querySpaceUid=<gen:2>,
              path=blog.thoughts.on.java.jpa21.entity.graph.model.Order.items.<elements>)
            - EntityAttributeFetchImpl(entity=blog.thoughts.on.java.jpa21.entity.graph.model.Product,
                querySpaceUid=<gen:3>,
                path=blog.thoughts.on.java.jpa21.entity.graph.model.Order.items.<elements>.product)
    - QuerySpaces
      - EntityQuerySpaceImpl(uid=<gen:0>, entity=blog.thoughts.on.java.jpa21.entity.graph.model.Order)
        - SQL table alias mapping - order0_
        - alias suffix - 0_
        - suffixed key columns - {id1_2_0_}
        - JOIN (JoinDefinedByMetadata(items)) : <gen:0> -> <gen:1>
          - CollectionQuerySpaceImpl(uid=<gen:1>,
              collection=blog.thoughts.on.java.jpa21.entity.graph.model.Order.items)
            - SQL table alias mapping - items1_
            - alias suffix - 1_
            - suffixed key columns - {order_id4_2_1_}
            - entity-element alias suffix - 2_
            - 2_entity-element suffixed key columns - id1_0_2_
            - JOIN (JoinDefinedByMetadata(elements)) : <gen:1> -> <gen:2>
              - EntityQuerySpaceImpl(uid=<gen:2>,
                  entity=blog.thoughts.on.java.jpa21.entity.graph.model.OrderItem)
                - SQL table alias mapping - items1_
                - alias suffix - 2_
                - suffixed key columns - {id1_0_2_}
                - JOIN (JoinDefinedByMetadata(product)) : <gen:2> -> <gen:3>
                  - EntityQuerySpaceImpl(uid=<gen:3>,
                      entity=blog.thoughts.on.java.jpa21.entity.graph.model.Product)
                    - SQL table alias mapping - product2_
                    - alias suffix - 3_
                    - suffixed key columns - {id1_1_3_}
17:34:51,311 DEBUG [org.hibernate.loader.entity.plan.EntityLoader] (pool-2-thread-1)
  Static select for entity blog.thoughts.on.java.jpa21.entity.graph.model.Order [NONE:-1]:
  select order0_.id as id1_2_0_,
    order0_.orderNumber as orderNum2_2_0_,
    order0_.version as version3_2_0_,
    items1_.order_id as order_id4_2_1_,
    items1_.id as id1_0_1_,
    items1_.id as id1_0_2_,
    items1_.order_id as order_id4_0_2_,
    items1_.product_id as product_5_0_2_,
    items1_.quantity as quantity2_0_2_,
    items1_.version as version3_0_2_,
    product2_.id as id1_1_3_,
    product2_.name as name2_1_3_,
    product2_.version as version3_1_3_
  from purchase_order order0_
    left outer join OrderItem items1_ on order0_.id=items1_.order_id
    left outer join Product product2_ on items1_.product_id=product2_.id
  where order0_.id=?

Initializing only one relationship is good enough for a blog post but in
a real project, you will most likely want to build more complex graphs.
So let’s do that.
You can, of course, provide an array of @NamedAttributeNode
annotations to fetch multiple attributes of the same entity and you can
use @NamedSubGraph to define the fetching behaviour for an additional
level of entities. I use that in the following code snippet to fetch not
only all related OrderItem entities but also the Product entity for
each OrderItem:

@Entity
@Table(name = "purchase_order")
@NamedEntityGraph(
  name = "graph.Order.items",
  attributeNodes = @NamedAttributeNode(value = "items", subgraph = "items"),
  subgraphs = @NamedSubgraph(name = "items", attributeNodes = @NamedAttributeNode("product")))
public class Order implements Serializable {

    ...
}

As you can see, the definition of a @NamedSubGraph is very similar to
the definition of a @NamedEntityGraph. You can then reference this
subgraph in a @NamedAttributeNode annotation to define the fetching
behaviour for this specific attribute.

The combination of these annotations allows you to define complex entity
graphs which you can use to initialize all relationships you use in your
use case and avoid n+1 select issues. If you want to
specify
your entity graph dynamically at runtime, you can do this also via a
Java API.

2.3. Update Entities One by One

Updating entities one by one feels very natural if you think in an
object oriented way. You just get the entities you want to update and
call a few setter methods to change their attributes like you do it with
any other object.

This approach works fine if you only change a few entities. But it gets
very inefficient when you work with a list of entities and is the third
performance issues you can easily spot in your log file. You just have
to look for a bunch SQL UPDATE statements that look completely the same,
as you can see in the following log file:

22:58:05,829 DEBUG SQL:92 - select
  product0_.id as id1_1_,
  product0_.name as name2_1_,
  product0_.price as price3_1_,
  product0_.version as version4_1_ from Product product0_
22:58:05,883 DEBUG SQL:92 - update Product set name=?, price=?, version=? where id=? and version=?
22:58:05,889 DEBUG SQL:92 - update Product set name=?, price=?, version=? where id=? and version=?
22:58:05,891 DEBUG SQL:92 - update Product set name=?, price=?, version=? where id=? and version=?
22:58:05,893 DEBUG SQL:92 - update Product set name=?, price=?, version=? where id=? and version=?
22:58:05,900 DEBUG SQL:92 - update Product set name=?, price=?, version=? where id=? and version=?

The relational representation of the database records is a much better
fit for these use cases than the object oriented one. With SQL, you
could just write one SQL statement that updates all the records you want
to change.

You can do the same with Hibernate if you use JPQL,
native
SQL or the
CriteriaUpdate
API. All 3 of very similar, so let’s use JPQL in this example.

You can define a JPQL UPDATE statement in a similar way as you know it
from SQL. You just define which entity you want to update, how to change
the values of its attributes and limit the affected entities in the
WHERE statement.
You can see an example of it in the following code snippet where I
increase the price of all products by 10%:

em.createQuery("UPDATE Product p SET p.price = p.price*0.1").executeUpdate();

Hibernate creates an SQL UPDATE statement based on the JPQL statement
and sends it to the database which performs the update operation.

It’s pretty obvious that this approach is a lot faster if you have to
update a huge number entities. But it also has a drawback. Hibernate
doesn’t know which entities are affected by the update operation and
doesn’t update its 1st level cache. You should, therefore, make sure not
to read and update an entity with a JPQL statement within the same
Hibernate Session or you have to detach it to remove it from the cache.

3. Summary

Within this post, I’ve shown you 3 Hibernate performance issues which
you can find in your log files.

2 of them were caused by a huge number of SQL statements. This a common
reason for performance issues, if you work with Hibernate. Hibernate
hides the database access behind its API and that makes it often
difficult to guess the actual number of SQL statements. You should
therefore always check the executed SQL statements when you make a
change to your persistence tier.

getdocs

3043