Spring Data JPA Batch Inserts

1. Overview

Going out to the database is expensive. We may be able to improve performance and consistency by batching multiple inserts into one.

In this tutorial, we’ll look at how to do this with Spring Data JPA.

2. Spring JPA Repository

First, we’ll need a simple entity. Let’s call it Customer:

@Entity
public class Customer {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;
    private String firstName;
    private String lastName;

    // constructor, getters, setters
}

And then, we need our repository:

public interface CustomerRepository extends CrudRepository<Customer, Long> {
}

This exposes a saveAll method for us, which will batch several inserts into one.

So, let’s leverage that in a controller:

@RestController
public class CustomerController {
    @Autowired
    CustomerRepository customerRepository;

    @PostMapping("/customers")
    public ResponseEntity<String> insertCustomers() {
        Customer c1 = new Customer("James", "Gosling");
        Customer c2 = new Customer("Doug", "Lea");
        Customer c3 = new Customer("Martin", "Fowler");
        Customer c4 = new Customer("Brian", "Goetz");
        List<Customer> customers = Arrays.asList(c1, c2, c3, c4);
        customerRepository.saveAll(customers);
        return ResponseEntity.created("/customers");
    }

    // ... @GetMapping to read customers
}

3. Testing Our Endpoint

Testing our code is simple with MockMvc:

@Autowired
private MockMvc mockMvc;

@Test
public void whenInsertingCustomers_thenCustomersAreCreated() throws Exception {
    this.mockMvc.perform(post("/customers"))
      .andExpect(status().isCreated()));
}

4. Are We Sure We’re Batching?

So, actually, there is a just a bit more configuration to do – let’s do a quick demo to illustrate the difference.

First, let’s add the following property to application.properties to see some statistics:

spring.jpa.properties.hibernate.generate_statistics=true

At this point, if we run the test, we’ll see stats like the following:

11232586 nanoseconds spent preparing 4 JDBC statements;
4076610 nanoseconds spent executing 4 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;

So, we created four customers, which is great, but note that none of them were inside a batch.

The reason is that batching isn’t switched on by default in some cases.

In our case, it’s because we are using id auto-generation. So, by default, saveAll does each insert separately.

So, let’s switch it on:

spring.jpa.properties.hibernate.jdbc.batch_size=4
spring.jpa.properties.hibernate.order_inserts=true

The first property tells Hibernate to collect inserts in batches of four. The order_inserts property tells Hibernate to take the time to group inserts by entity, creating larger batches.

So, the second time we run our test, we’ll see the inserts were batched:

16577314 nanoseconds spent preparing 4 JDBC statements;
2207548 nanoseconds spent executing 4 JDBC statements;
2003005 nanoseconds spent executing 1 JDBC batches;

We can apply the same approach to deletes and updates (remembering that Hibernate also has an order_updates property).

5. Conclusion

With the ability to batch inserts, we can see some performance gains.

We, of course, need to be aware that batching is automatically disabled in some cases, and we should check and plan for this before we ship.

Make sure to check out all these code snippets over on GitHub.

Leave a Reply

Your email address will not be published.