Guide to Java URL Encoding/Decoding

1. Introduction

Simply put, URL encoding translates special characters from the URL to a representation that adheres to the spec and can be correctly understood and interpreted.

In this article, we’ll focus on how to encode/decode the URL or form data so that it adheres to the spec and transmits over the network correctly.

2. Analyze the URL

A basic URI syntax can be generalized as:

scheme:[//[user:[email protected]]host[:port]][/]path[?query][#fragment]

The first step into encoding a URI is examining its parts and then encoding only the relevant portions.

Let us look at an example of a URI:

String testUrl =
  "http://www.baeldung.com?key1=value+1&key2=value%40%21%242&key3=value%253";

One way of analyzing the URI is loading the String representation to a java.net.URI class:

@Test
public void givenURL_whenAnalyze_thenCorrect() throws Exception {
    URI uri = new URI(testUrl);

    assertThat(uri.getScheme(), is("http"));
    assertThat(uri.getHost(), is("www.baeldung.com"));
    assertThat(uri.getRawQuery(),
      .is("key1=value+1&key2=value%40%21%242&key3=value%253"));
}

The URI class parses the string representation URL and exposes its parts via a simple API – e.g., getXXX.

3. Encode the URL

When encoding URI, one of the common pitfalls is encoding the complete URI. Typically, we need to encode only the query portion of the URI.

Let’s encode the data using the encode(data, encodingScheme) method of the URLEncoder class:

private String encodeValue(String value) {
    return URLEncoder.encode(value, StandardCharsets.UTF_8.toString());
}

@Test
public void givenRequestParam_whenUTF8Scheme_thenEncode() throws Exception {
    Map<String, String> requestParams = new HashMap<>();
    requestParams.put("key1", "value 1");
    requestParams.put("key2", "[email protected]!$2");
    requestParams.put("key3", "value%3");

    String encodedURL = requestParams.keySet().stream()
      .map(key -> key + "=" + encodeValue(requestParams.get(key)))
      .collect(joining("&", "http://www.baeldung.com?", ""));

    assertThat(testUrl, is(encodedURL));

The encode method accepts two parameters:

  1. data – string to be translated

  2. encodingScheme – name of the character encoding

This encode method converts the string into application/x-www-form-urlencoded format.

The encoding scheme will convert special characters into two digits hexadecimal representation of 8 bits that will be represented in the form of “%xy“. When we are dealing with path parameters or adding parameters which are dynamic, then we will encode the data and then send to the server.

Note: The World Wide Web Consortium Recommendation states that UTF-8 should be used. Not doing so may introduce incompatibilities. (Reference: https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html)

4. Decode the URL

Let us now decode the previous URL using the decode method of the URLDecoder:

private String decode(String value) {
    return URLDecoder.decode(value, StandardCharsets.UTF_8.toString());
}

@Test
public void givenRequestParam_whenUTF8Scheme_thenDecodeRequestParams() {
    URI uri = new URI(testUrl);

    String scheme = uri.getScheme();
    String host = uri.getHost();
    String query = uri.getRawQuery();

    String decodedQuery = Arrays.stream(query.split("&"))
      .map(param -> param.split("=")[0] + "=" + decode(param.split("=")[1]))
      .collect(Collectors.joining("&"));

    assertEquals(
      "http://www.baeldung.com?key1=value 1&[email protected]!$2&key3=value%3",
      scheme + "://" + host + "?" + decodedQuery);
}

The two important bits here are:

  • analyze URL before decoding

  • use the same encoding scheme for encoding and decoding

If we were to decode than analyze, URL portions might not be parsed correctly. If we used another encoding scheme to decode the data, it would result in garbage data.

5. Encode a Path Segment

URLEncoder cannot be used for encoding path segment of the URL. Path component refers to the hierarchical structure which represents a directory path, or it serves to locate resources separated by “/”.

Reserved characters in path segment are different than in query parameter values. For example, a “+” sign is a valid character in path segment and therefore should not be encoded.

To encode the path segment, we use the UriUtils class by Spring Framework instead. UriUtils class provides encodePath and encodePathSegment methods for encoding path and path segment respectively.

Let’s looks at an example:

private String encodePath(String path) {
    try {
        path = UriUtils.encodePath(path, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        LOGGER.error("Error encoding parameter {}", e.getMessage(), e);
    }
    return path;
}
@Test
public void givenPathSegment_thenEncodeDecode()
  throws UnsupportedEncodingException {
    String pathSegment = "/Path 1/Path+2";
    String encodedPathSegment = encodePath(pathSegment);
    String decodedPathSegment = UriUtils.decode(encodedPathSegment, "UTF-8");

    assertEquals("/Path%201/Path+2", encodedPathSegment);
    assertEquals("/Path 1/Path+2", decodedPathSegment);
}

In the above code snippet we can see that when we used the encodePathSegment method, it returned the encoded value and + is not being encoded because it is a value character in the path component.

Let us add a path variable to our test URL:

String testUrl
  = "/path+1?key1=value+1&key2=value%40%21%242&key3=value%253";

and to assemble and assert a properly encoded URL let us change the test from section 2:

String path = "path+1";
String encodedURL = requestParams.keySet().stream()
  .map(k -> k + "=" + encodeValue(requestParams.get(k)))
  .collect(joining("&", "/" + encodePath(path) + "?", ""));
assertThat(testUrl, CoreMatchers.is(encodedURL));

6. Conclusion

In this tutorial, we have seen how to encode and decode the data so that it can be transferred and interpreted correctly. While the article focused on encoding/decoding URI query parameter values, the approach applies to HTML form parameters as well.

You can find the source code over on GitHub.

Leave a Reply

Your email address will not be published.