Multipart Uploads in Amazon S3 with Java

1. Overview

In this tutorial, we’ll see how to handle multipart uploads in Amazon S3 with AWS Java SDK.

Simply put, in a multipart upload, we split the content into smaller parts and upload each part individually. All parts are re-assembled when received.

Multipart uploads offer the following advantages:

  • Higher throughput – we can upload parts in parallel

  • Easier error recovery – we need to re-upload only the failed parts

  • Pause and resume uploads – we can upload parts at any point in time. The whole process can be paused and remaining parts can be uploaded later

Note that when using multipart upload with Amazon S3, each part except the last part must be at least 5 MB in size.

2. Maven Dependencies

Before we begin, we need to add the AWS SDK dependency in our project:

<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-java-sdk</artifactId>
    <version>1.11.290</version>
</dependency>

To view the latest version, check out Maven Central.

3. Performing Multipart Upload


==== 3.1. Creating Amazon S3 Client

First, we need to create a client for accessing Amazon S3. We’ll use the AmazonS3ClientBuilder for this purpose:

AmazonS3 amazonS3 = AmazonS3ClientBuilder
  .standard()
  .withCredentials(new DefaultAWSCredentialsProviderChain())
  .withRegion(Regions.DEFAULT_REGION)
  .build();

This creates a client using the default credential provider chain for accessing AWS credentials.

For more details on how the default credential provider chain works, please see the documentation. If you’re using a region other than the default (US West-2), make sure you replace Regions.DEFAULT_REGION with that custom region.

3.2. Creating TransferManager for Managing Uploads

We’ll use TransferManagerBuilder to create a TransferManager instance.

This class provides simple APIs to manage uploads and downloads with Amazon S3 and manages all related tasks:

TransferManager tm = TransferManagerBuilder.standard()
  .withS3Client(amazonS3)
  .withMultipartUploadThreshold((long) (5 * 1024 * 1025))
  .build();

Multipart upload threshold specifies the size, in bytes, above which the upload should be performed as multipart upload.

Amazon S3 imposes a minimum part size of 5 MB (for parts other than last part), so we have used 5 MB as multipart upload threshold.

3.3. Uploading Object

To upload object using TransferManager we simply need to call its upload() function. This uploads the parts in parallel:

String bucketName = "baeldung-bucket";
String keyName = "my-picture.jpg";
String file = new File("documents/my-picture.jpg");
Upload upload = tm.upload(bucketName, keyName, file);

TransferManager.upload() returns an Upload object. This can be used to check the status of and manage uploads. We’ll do so in the next section.

3.4. Waiting For Upload to Complete

TransferManager.upload() is a non-blocking function; it returns immediately while the upload runs in the background.

We can use the returned Upload object to wait for the upload to complete before exiting the program:

try {
    upload.waitForCompletion();
} catch (AmazonClientException e) {
    // ...
}

3.5. Tracking the Upload Progress

Track the progress of the upload is quite a common requirement; we can do that with the help of a ProgressListener instance:

ProgressListener progressListener = progressEvent -> System.out.println(
  "Transferred bytes: " + progressEvent.getBytesTransferred());
PutObjectRequest request = new PutObjectRequest(
  bucketName, keyName, file);
request.setGeneralProgressListener(progressListener);
Upload upload = tm.upload(request);

The ProgressListener we created will simply continue to print the number of bytes transferred until the upload completes.

3.6. Controlling Upload Parallelism

By default, TransferManager uses a maximum of ten threads to perform multipart uploads.

We can, however, control this by specifying an ExecutorService while building TransferManager:

int maxUploadThreads = 5;
TransferManager tm = TransferManagerBuilder.standard()
  .withS3Client(amazonS3)
  .withMultipartUploadThreshold((long) (5 * 1024 * 1025))
  .withExecutorFactory(() -> Executors.newFixedThreadPool(maxUploadThreads))
  .build();

Here, we used a lambda for creating a wrapper implementation of ExecutorFactory and passed it to withExecutorFactory() function.

4. Conclusion

In this quick article, we learned how to perform multipart uploads using AWS SDK for Java, and we saw how to control some aspects of upload and to keep track of its progress.

As always, the complete code of this article is available over on GitHub.

Leave a Reply

Your email address will not be published.