introduction-to-spring-batch

Posted on: 2019-10-08 2019-11-19
Tags: Spring Batch

Introduction to Spring Batch

1. Introduction

In this article we’re going to focus on a practical, code-focused intro to Spring Batch. Spring Batch is a processing framework designed for robust execution of jobs.

It’s current version 3.0, which supports Spring 4 and Java 8. It also accommodates JSR-352, which is new java specification for batch processing.

Here are a few interesting and practical use-cases of the framework.

2. Workflow Basics

Spring batch follows the traditional batch architecture where a job repository does the work of scheduling and interacting with the job.

A job can have more than one steps – and every step typically follows the sequence of reading data, processing it and writing it.

And of course the framework will do most of the heavy lifting for us here – especially when it comes to the low level persistence work of dealing with the jobs – using sqlite for the job repository.

2.1. Our Example Usecase

The simple usecase we’re going to tackle here is – we’re going to migrate some financial transaction data from CSV to XML.

The input file has a very simple structure – it contains a transaction per line, made up of: a username, the user id, the date of the transaction and the amount:

username, userid, transaction_date, transaction_amount
devendra, 1234, 31/10/2015, 10000
john, 2134, 3/12/2015, 12321
robin, 2134, 2/02/2015, 23411

3. The Maven POM

Dependencies required for this project are spring core, spring batch, and sqlite jdbc connector:

<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
  http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.baeldung</groupId>
    <artifactId>spring-batch-intro</artifactId>
    <version>0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>spring-batch-intro</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <spring.version>4.2.0.RELEASE</spring.version>
        <spring.batch.version>3.0.5.RELEASE</spring.batch.version>
        <sqlite.version>3.8.11.2</sqlite.version>
    </properties>

    <dependencies>
        <!-- SQLite database driver -->
        <dependency>
            <groupId>org.xerial</groupId>
            <artifactId>sqlite-jdbc</artifactId>
            <version>${sqlite.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-oxm</artifactId>
            <version>${spring.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-jdbc</artifactId>
            <version>${spring.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.batch</groupId>
            <artifactId>spring-batch-core</artifactId>
            <version>${spring.batch.version}</version>
        </dependency>
    </dependencies>
</project>

4. Spring Batch Config

First thing we’ll do is to configure Spring Batch with XML:

<beans xmlns="http://www.springframework.org/schema/beans"
  xmlns:jdbc="http://www.springframework.org/schema/jdbc"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="
    http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans-4.2.xsd
    http://www.springframework.org/schema/jdbc
    http://www.springframework.org/schema/jdbc/spring-jdbc-4.2.xsd
">

    <!-- connect to SQLite database -->
    <bean id="dataSource"
      class="org.springframework.jdbc.datasource.DriverManagerDataSource">
        <property name="driverClassName" value="org.sqlite.JDBC" />
        <property name="url" value="jdbc:sqlite:repository.sqlite" />
        <property name="username" value="" />
        <property name="password" value="" />
    </bean>

    <!-- create job-meta tables automatically -->
    <jdbc:initialize-database data-source="dataSource">
        <jdbc:script
          location="org/springframework/batch/core/schema-drop-sqlite.sql" />
        <jdbc:script location="org/springframework/batch/core/schema-sqlite.sql" />
    </jdbc:initialize-database>

    <!-- stored job-meta in memory -->
    <!--
    <bean id="jobRepository"
      class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
        <property name="transactionManager" ref="transactionManager" />
    </bean>
    -->

    <!-- stored job-meta in database -->
    <bean id="jobRepository"
      class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean">
        <property name="dataSource" ref="dataSource" />
        <property name="transactionManager" ref="transactionManager" />
        <property name="databaseType" value="sqlite" />
    </bean>

    <bean id="transactionManager" class=
      "org.springframework.batch.support.transaction.ResourcelessTransactionManager" />

    <bean id="jobLauncher"
      class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
        <property name="jobRepository" ref="jobRepository" />
    </bean>

</beans>

Of course a Java configuration is also available:

@Configuration
@EnableBatchProcessing
public class SpringConfig {

    @Value("org/springframework/batch/core/schema-drop-sqlite.sql")
    private Resource dropReopsitoryTables;

    @Value("org/springframework/batch/core/schema-sqlite.sql")
    private Resource dataReopsitorySchema;

    @Bean
    public DataSource dataSource() {
        DriverManagerDataSource dataSource = new DriverManagerDataSource();
        dataSource.setDriverClassName("org.sqlite.JDBC");
        dataSource.setUrl("jdbc:sqlite:repository.sqlite");
        return dataSource;
    }

    @Bean
    public DataSourceInitializer dataSourceInitializer(DataSource dataSource)
      throws MalformedURLException {
        ResourceDatabasePopulator databasePopulator =
          new ResourceDatabasePopulator();

        databasePopulator.addScript(dropReopsitoryTables);
        databasePopulator.addScript(dataReopsitorySchema);
        databasePopulator.setIgnoreFailedDrops(true);

        DataSourceInitializer initializer = new DataSourceInitializer();
        initializer.setDataSource(dataSource);
        initializer.setDatabasePopulator(databasePopulator);

        return initializer;
    }

    private JobRepository getJobRepository() throws Exception {
        JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
        factory.setDataSource(dataSource());
        factory.setTransactionManager(getTransactionManager());
        factory.afterPropertiesSet();
        return (JobRepository) factory.getObject();
    }

    private PlatformTransactionManager getTransactionManager() {
        return new ResourcelessTransactionManager();
    }

    public JobLauncher getJobLauncher() throws Exception {
        SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
        jobLauncher.setJobRepository(getJobRepository());
        jobLauncher.afterPropertiesSet();
        return jobLauncher;
    }
}

5. Spring Batch Job Config

Let’s now write our job description for the CSV to XML work:

<beans xmlns="http://www.springframework.org/schema/beans"
  xmlns:batch="http://www.springframework.org/schema/batch"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.springframework.org/schema/batch
    http://www.springframework.org/schema/batch/spring-batch-3.0.xsd
    http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans-4.2.xsd
">

    <import resource="spring.xml" />

    <bean id="record" class="org.baeldung.spring_batch_intro.model.Transaction"></bean>
    <bean id="itemReader"
      class="org.springframework.batch.item.file.FlatFileItemReader">

        <property name="resource" value="input/record.csv" />

        <property name="lineMapper">
            <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
                <property name="lineTokenizer">
                    <bean class=
                      "org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                        <property name="names" value="username,userid,transactiondate,amount" />
                    </bean>
                </property>
                <property name="fieldSetMapper">
                    <bean class="org.baeldung.spring_batch_intro.service.RecordFieldSetMapper" />
                </property>
            </bean>
        </property>
    </bean>

    <bean id="itemProcessor"
      class="org.baeldung.spring_batch_intro.service.CustomItemProcessor" />

    <bean id="itemWriter"
      class="org.springframework.batch.item.xml.StaxEventItemWriter">
        <property name="resource" value="file:xml/output.xml" />
        <property name="marshaller" ref="recordMarshaller" />
        <property name="rootTagName" value="transactionRecord" />
    </bean>

    <bean id="recordMarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
        <property name="classesToBeBound">
            <list>
                <value>org.baeldung.spring_batch_intro.model.Transaction</value>
            </list>
        </property>
    </bean>
    <batch:job id="firstBatchJob">
        <batch:step id="step1">
            <batch:tasklet>
                <batch:chunk reader="itemReader" writer="itemWriter"
                  processor="itemProcessor" commit-interval="10">
                </batch:chunk>
            </batch:tasklet>
        </batch:step>
    </batch:job>
</beans>

And of course, the similar java based job config:

public class SpringBatchConfig {

    @Autowired
    private JobBuilderFactory jobs;

    @Autowired
    private StepBuilderFactory steps;

    @Value("input/record.csv")
    private Resource inputCsv;

    @Value("file:xml/output.xml")
    private Resource outputXml;

    @Bean
    public ItemReader<Transaction> itemReader()
      throws UnexpectedInputException, ParseException {
        FlatFileItemReader<Transaction> reader = new FlatFileItemReader<Transaction>();
        DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
        String[] tokens = { "username", "userid", "transactiondate", "amount" };
        tokenizer.setNames(tokens);
        reader.setResource(inputCsv);
        DefaultLineMapper<Transaction> lineMapper =
          new DefaultLineMapper<Transaction>();
        lineMapper.setLineTokenizer(tokenizer);
        lineMapper.setFieldSetMapper(new RecordFieldSetMapper());
        reader.setLineMapper(lineMapper);
        return reader;
    }

    @Bean
    public ItemProcessor<Transaction, Transaction> itemProcessor() {
        return new CustomItemProcessor();
    }

    @Bean
    public ItemWriter<Transaction> itemWriter(Marshaller marshaller)
      throws MalformedURLException {
        StaxEventItemWriter<Transaction> itemWriter =
          new StaxEventItemWriter<Transaction>();
        itemWriter.setMarshaller(marshaller);
        itemWriter.setRootTagName("transactionRecord");
        itemWriter.setResource(outputXml);
        return itemWriter;
    }

    @Bean
    public Marshaller marshaller() {
        Jaxb2Marshaller marshaller = new Jaxb2Marshaller();
        marshaller.setClassesToBeBound(new Class[] { Transaction.class });
        return marshaller;
    }

    @Bean
    protected Step step1(ItemReader<Transaction> reader,
      ItemProcessor<Transaction, Transaction> processor,
      ItemWriter<Transaction> writer) {
        return steps.get("step1").<Transaction, Transaction> chunk(10)
          .reader(reader).processor(processor).writer(writer).build();
    }

    @Bean(name = "firstBatchJob")
    public Job job(@Qualifier("step1") Step step1) {
        return jobs.get("firstBatchJob").start(step1).build();
    }
}

OK, so now that we have the whole config, let’s break it down and start discussing it.

5.1. Read Data and Create Objects with ItemReader

First we configured the cvsFileItemReader which will read the data from the record.csv and convert it into the Transaction object:

@SuppressWarnings("restriction")
@XmlRootElement(name = "transactionRecord")
public class Transaction {
    private String username;
    private int userId;
    private Date transactionDate;
    private double amount;

    /* getters and setters for the attributes */

    @Override
    public String toString() {
        return "Transaction [username=" + username + ", userId=" + userId
          + ", transactionDate=" + transactionDate + ", amount=" + amount
          + "]";
    }
}

To do so – it uses a custom mapper:

public class RecordFieldSetMapper implements FieldSetMapper<Transaction> {

    public Transaction mapFieldSet(FieldSet fieldSet) throws BindException {
        SimpleDateFormat dateFormat = new SimpleDateFormat("dd/MM/yyyy");
        Transaction transaction = new Transaction();

        transaction.setUsername(fieldSet.readString("username"));
        transaction.setUserId(fieldSet.readInt(1));
        transaction.setAmount(fieldSet.readDouble(3));
        String dateString = fieldSet.readString(2);
        try {
            transaction.setTransactionDate(dateFormat.parse(dateString));
        } catch (ParseException e) {
            e.printStackTrace();
        }
        return transaction;
    }
}

5.2. Processing Data with ItemProcessor

We have created our own item processor, CustomItemProcessor. This doesn’t process anything related to the transaction object – all it does is passes the original object coming from reader to the writer:

public class CustomItemProcessor implements ItemProcessor<Transaction, Transaction> {

    public Transaction process(Transaction item) {
        return item;
    }
}

5.3. Writing Objects to the FS with ItemWriter

Finally, we are going to store this transaction into an xml file located at xml/output.xml:

<bean id="itemWriter"
  class="org.springframework.batch.item.xml.StaxEventItemWriter">
    <property name="resource" value="file:xml/output.xml" />
    <property name="marshaller" ref="recordMarshaller" />
    <property name="rootTagName" value="transactionRecord" />
</bean>

5.4. Configuring the Batch Job

So all we have to do is connect the dots with a job – using the batch:job syntax.

Note the commit-interval – that’s the number of transactions to be kept in memory before committing the batch to the itemWriter; it will hold the transactions in memory until that point (or until the end of the input data is encountered):

<batch:job id="firstBatchJob">
    <batch:step id="step1">
        <batch:tasklet>
            <batch:chunk reader="itemReader" writer="itemWriter"
              processor="itemProcessor" commit-interval="10">
            </batch:chunk>
        </batch:tasklet>
    </batch:step>
</batch:job>

5.5. Running the Batch Job

That’s it – let’s now set up and run everything:

public class App {
    public static void main(String[] args) {
        // Spring Java config
        AnnotationConfigApplicationContext context = new AnnotationConfigApplicationContext();
        context.register(SpringConfig.class);
        context.register(SpringBatchConfig.class);
        context.refresh();

        JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
        Job job = (Job) context.getBean("firstBatchJob");
        System.out.println("Starting the batch job");
        try {
            JobExecution execution = jobLauncher.run(job, new JobParameters());
            System.out.println("Job Status : " + execution.getStatus());
            System.out.println("Job completed");
        } catch (Exception e) {
            e.printStackTrace();
            System.out.println("Job failed");
        }
    }
}

6. Conclusion

This tutorial gives you a basic idea of how to work with Spring Batch and how to use it in a simple usecase.

It shows how you can easily develop your batch processing pipeline and how you can customize different stages in reading, processing and writing.

The full implementation of this tutorial can be found in the github project – this is an Eclipse based project, so it should be easy to import and run as it is.

getdocs

3043