java-stream-reduce

Posted on: 2019-10-08 2019-11-19
Tags: Java Streams

Guide to Stream.reduce()

1. Overview

The Stream API provides a rich repertoire of intermediate, reduction, and terminal functions, which also support parallelization.

More specifically, reduction stream operations allow us to produce one single result from a sequence of elements, by applying repeatedly a combining operation to the elements in the sequence.

In this tutorial, we’ll look at the general-purpose Stream.reduce() operation and see it in some concrete use cases.

2. The Key Concepts: Identity, Accumulator, and Combiner

Before we look deeper into using the Stream.reduce() operation, let’s break down the operation’s participant elements into separate blocks. That way, we’ll understand more easily the role that each one plays:

Identity – an element that is the initial value of the reduction operation and the default result if the stream is empty
Accumulator – a function that takes two parameters: a partial result of the reduction operation and the next element of the stream
Combiner – a function used to combine the partial result of the reduction operation when the reduction is parallelized, or when there’s a mismatch between the types of the accumulator arguments and the types of the accumulator implementation

3. Using Stream.reduce()

To better understand the functionality of the identity, accumulator, and combiner elements, let’s look at some basic examples:

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 6);
int result = numbers
  .stream()
  .reduce(0, (subtotal, element) -> subtotal + element);
assertThat(result).isEqualTo(21);

In this case, the Integer value 0 is the identity. It stores the initial value of the reduction operation, and also the default result when the stream of Integer values is empty.

Likewise, the lambda expression:

subtotal, element -> subtotal + element

is the accumulator, since it takes the partial sum of Integer values and the next element in the stream.

To make the code even more concise, we can use a method reference, instead of a lambda expression:

int result = numbers.stream().reduce(0, Integer::sum);
assertThat(result).isEqualTo(21);

Of course, we can use a reduce() operation on streams holding other types of elements.

For instance, we can use reduce() on an array of String elements and join them into a single result:

List<String> letters = Arrays.asList("a", "b", "c", "d", "e");
String result = letters
  .stream()
  .reduce("", (partialString, element) -> partialString + element);
assertThat(result).isEqualTo("abcde");

Similarly, we can switch to the version that uses a method reference:

String result = letters.stream().reduce("", String::concat);
assertThat(result).isEqualTo("abcde");

Let’s use the reduce() operation for joining the uppercased elements of the letters array:

String result = letters
  .stream()
  .reduce(
    "", (partialString, element) -> partialString.toUpperCase() + element.toUpperCase());
assertThat(result).isEqualTo("ABCDE");

In addition, we can use reduce() in a parallelized stream (more on this later):

List<Integer> ages = Arrays.asList(25, 30, 45, 28, 32);
int computedAges = ages.parallelStream().reduce(0, a, b -> a + b, Integer::sum);

When a stream executes in parallel, the Java runtime splits the stream into multiple substreams. In such cases, we need to use a function to combine the results of the substreams into a single one. This is the role of the combiner – in the above snippet, it’s the Integer::sum method reference.

Funny enough, this code won’t compile:

List<User> users = Arrays.asList(new User("John", 30), new User("Julie", 35));
int computedAges =
  users.stream().reduce(0, (partialAgeResult, user) -> partialAgeResult + user.getAge());

In this case, we have a stream of User objects, and the types of the accumulator arguments are Integer and User. However, the accumulator implementation is a sum of Integers, so the compiler just can’t infer the type of the user parameter.

We can fix this issue by using a combiner:

int result = users.stream()
  .reduce(0, (partialAgeResult, user) -> partialAgeResult + user.getAge(), Integer::sum);
assertThat(result).isEqualTo(65);

To put it simply, if we use sequential streams and the types of the accumulator arguments and the types of its implementation match, we don’t need to use a combiner.

4. Reducing in Parallel

As we learned before, we can use reduce() on parallelized streams.

When we use parallelized streams, we should make sure that reduce() or any other aggregate operations executed on the streams are:

associative: the result is not affected by the order of the operands
non-interfering: the operation doesn’t affect the data source
stateless and deterministic: the operation doesn’t have state and produces the same output for a given input

We should fulfill all these conditions to prevent unpredictable results.

As expected, operations performed on parallelized streams, including reduce(), are executed in parallel, hence taking advantage of multi-core hardware architectures.

For obvious reasons, parallelized streams are much more performant than the sequential counterparts. Even so, they can be overkill if the operations applied to the stream aren’t expensive, or the number of elements in the stream is small.

Of course, parallelized streams are the right way to go when we need to work with large streams and perform expensive aggregate operations.

Let’s create a simple JMH (the Java Microbenchmark Harness) benchmark test and compare the respective execution times when using the reduce() operation on a sequential and a parallelized stream:

@State(Scope.Thread)
private final List<User> userList = createUsers();

@Benchmark
public Integer executeReduceOnParallelizedStream() {
    return this.userList
      .parallelStream()
      .reduce(
        0, (partialAgeResult, user) -> partialAgeResult + user.getAge(), Integer::sum);
}

@Benchmark
public Integer executeReduceOnSequentialStream() {
    return this.userList
      .stream()
      .reduce(
        0, (partialAgeResult, user) -> partialAgeResult + user.getAge(), Integer::sum);
}

In the above JMH benchmark, we compare execution average times. We simply create a List containing a large number of User objects. Next, we call reduce() on a sequential and a parallelized stream and check that the latter performs faster than the former (in seconds-per-operation).

These are our benchmark results:

Benchmark                                                   Mode  Cnt  Score    Error  Units
JMHStreamReduceBenchMark.executeReduceOnParallelizedStream  avgt    5  0,007 ±  0,001   s/op
JMHStreamReduceBenchMark.executeReduceOnSequentialStream    avgt    5  0,010 ±  0,001   s/op

5. Throwing and Handling Exceptions While Reducing

In the above examples, the reduce() operation doesn’t throw any exceptions. But it might, of course.

For instance, say that we need to divide all the elements of a stream by a supplied factor and then sum them:

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 6);
int divider = 2;
int result = numbers.stream().reduce(0, a / divider + b / divider);

This will work, as long as the divider variable is not zero. But if it is zero, reduce() will throw an ArithmeticException exception: divide by zero.

We can easily catch the exception and do something useful with it, such as logging it, recovering from it and so forth, depending on the use case, by using a try/catch block:

public static int divideListElements(List<Integer> values, int divider) {
    return values.stream()
      .reduce(0, (a, b) -> {
          try {
              return a / divider + b / divider;
          } catch (ArithmeticException e) {
              LOGGER.log(Level.INFO, "Arithmetic Exception: Division by Zero");
          }
          return 0;
      });
}

While this approach will work, we polluted the lambda expression with the try/catch block. We no longer have the clean one-liner that we had before.

To fix this issue, we can use https://refactoring.com/catalog/extractFunction.html, and extract the try/catch block into a separate method:

private static int divide(int value, int factor) {
    int result = 0;
    try {
        result = value / factor;
    } catch (ArithmeticException e) {
        LOGGER.log(Level.INFO, "Arithmetic Exception: Division by Zero");
    }
    return result
}

Now, the implementation of the divideListElements() method is again clean and streamlined:

public static int divideListElements(List<Integer> values, int divider) {
    return values.stream().reduce(0, (a, b) -> divide(a, divider) + divide(b, divider));
}

Assuming that divideListElements() is a utility method implemented by an abstract NumberUtils class, we can create a unit test to check the behavior of the divideListElements() method:

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 6);
assertThat(NumberUtils.divideListElements(numbers, 1)).isEqualTo(21);

Let’s also test the divideListElements() method, when the supplied List of Integer values contains a 0:

List<Integer> numbers = Arrays.asList(0, 1, 2, 3, 4, 5, 6);
assertThat(NumberUtils.divideListElements(numbers, 1)).isEqualTo(21);

Finally, let’s test the method implementation when the divider is 0, too:

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 6);
assertThat(NumberUtils.divideListElements(numbers, 0)).isEqualTo(0);

6. Conclusion

In this tutorial, we learned how to use the Stream.reduce() operation. In addition, we learned how to perform reductions on sequential and parallelized streams, and how to handle exceptions while reducing.

As usual, all the code samples shown in this tutorial are available over on GitHub.

getdocs

3043