Get Substring from String in Java

1. Overview

In this tutorial, we’ll find out how to get a substring from a String in Java.

We’ll mostly use the methods from the String class and few from Apache Commons’ StringUtils class.

In all of the examples we’re going to work on:

String text = "Julia Evans was born on 25-09-1984. "
  + "She is currently living in the USA (United States of America).";

2. Maven Dependencies

In order to use the StringUtils class which is part of the Apache Commons Lang library, we need to add the following dependency in the Maven project:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.8</version>
</dependency>

You can find the latest version of this library here.

3. Using substring

This is the most frequently used method to extract a substring. You can provide a start index and an optional end index to extract the characters in between.

If we don’t specify the end index, the substring will be all the way until the end of the String.

Let’s consider we want to extract Julia’s country of residence:

assertEquals("USA (United States of America).",
  text.substring(67));

To get rid of the period at the end of the output we can use the slightly modified version of the same method:

assertEquals("USA (United States of America)",
  text.substring(67, text.length() - 1));

In the examples above, we’ve used the exact position to extract the substring.

3.1. Getting a Substring Starting at a Specific Character

In case the position needs to be dynamically calculated based on a character or String we can make use of the indexOf method:

assertEquals("United States of America",
  text.substring(text.indexOf('(') + 1, text.indexOf(')')));

A similar method that can help us locate our substring is lastIndexOf. Let’s use lastIndexOf to extract the year “1984”. Its the portion of text between the last dash and the first dot:

assertEquals("1984",
  text.substring(text.lastIndexOf('-') + 1, text.indexOf('.')));

Both indexOf and lastIndexOf can take a character or a String as a parameter. Let’s extract the text “USA” and the rest of the text in the parenthesis:

assertEquals("USA (United States of America)",
  text.substring(text.indexOf("USA"), text.indexOf(')') + 1));

4. Using subSequence

The String class provides another method called subSequence which acts similar to the substring method.

The only difference is that it returns a CharSequence instead of a String and it can only be used with a specific start and end index:

assertEquals("USA (United States of America)",
  text.subSequence(67, text.length() - 1));

5. Using Regular Expressions

Regular expressions will come to our rescue if we have to extract a substring that matches a specific pattern.

In the example String, Julia’s date of birth is in the format “dd-mm-yyyy”. We can match this pattern using the Java regular expression API.

First of all, we need to create a pattern for “dd-mm-yyyy”:

Pattern pattern = Pattern.compile("\\d{2}-\\d{2}-\\d{4}");

Then, we’ll apply the pattern to find a match from the given text:

Matcher matcher = pattern.matcher(text);

Upon successful match we can extract the matched String:

if (matcher.find()) {
    Assert.assertEquals("25-09-1984", matcher.group());
}

For more details on the Java regular expressions check out this tutorial.

6. Using split

We can use the split method from the String class to extract a substring. Say we want to extract the first sentence from the example String. This is quite easy to do using split:

String[] sentences = text.split("\\.");

Since the split method accepts a regex we had to escape the period character. Now the result is an array of 2 sentences.

We can use the first sentence (or iterate through the whole array):

assertEquals("Julia Evans was born on 25-09-1984", sentences[0]);

Please note that there are better ways for sentence detection and tokenization using Apache OpenNLP. Check out this tutorial to learn more about the OpenNLP API.

7. Using Scanner

We generally use Scanner to parse primitive types and Strings using regular expressions. A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace.

Let’s find out how to use this to get the first sentence from the example text:

try (Scanner scanner = new Scanner(text)) {
    scanner.useDelimiter("\\.");
    assertEquals("Julia Evans was born on 25-09-1984", scanner.next());
}

In the above example, we have set the example String as the source for the scanner to use.

Then we are setting the period character as the delimiter (which needs to be escaped otherwise it will be treated as the special regular expression character in this context).

Finally, we assert the first token from this delimited output.

If required, we can iterate through the complete collection of tokens using a while loop.

while (scanner.hasNext()) {
   // do something with the tokens returned by scanner.next()
}

8. Using StringUtils

The Apache Commons libraries add some useful methods for manipulating core Java types. Apache Commons Lang provides a host of helper utilities for the java.lang API, most notably String manipulation methods.

In this example, we’re going to see how to extract a substring nested between two Strings:

assertEquals("United States of America",
  StringUtils.substringBetween(text, "(", ")"));

There is a simplified version of this method in case the substring is nested in between two instances of the same String:

substringBetween(String str, String tag)

The substringAfter method from the same class gets the substring after the first occurrence of a separator.

The separator isn’t returned:

assertEquals("the USA (United States of America).",
  StringUtils.substringAfter(text, "living in "));

Similarly, the substringBefore method gets the substring before the first occurrence of a separator.

The separator isn’t returned:

assertEquals("Julia Evans",
  StringUtils.substringBefore(text, " was born"));

You can check out this tutorial to find out more on String processing using Apache Commons Lang API.

9. Conclusion

In this quick article, we found out various ways to extract a substring from a String in Java. You can explore our other tutorials on String manipulation in Java.

As always, code snippets can be found over on GitHub.

Leave a Reply

Your email address will not be published.