remove-text-after-x-in-file

Posted on: 2019-11-19 2019-11-19
Tags: Linux

Linux Commands – Remove All Text After X

1. Overview

There are various occasions when we might want to remove the text after a specific character or set of characters. For example, one typical scenario is when we want to remove the extension of a particular filename.

In this quick tutorial, we’re going to explore several approaches to see how we can manipulate strings to remove text after a given pattern. We’ll be using the Bash shell in our examples, but these commands may also work in other POSIX shells.

2. Native String Manipulation

Let’s start by taking a look at how we can remove text using some of the built-in string manipulation operations offered by Bash. For this, we’re going to be using a feature of the shell called parameter expansion.

To quickly recap, parameter expansion is the process where Bash expands a variable with a given value. To achieve this we simply use a dollar sign followed by our variable name, enclosed in braces:

my_var="Hola Mundo"
echo ${my_var}

As expected the above example results in the output:

Hola Mundo

But as we’re going to see during the expansion process, we can also modify the variable value or substitute it for other values.

Now that we understand the basics of parameter expansion, in the next subsections, we’ll explain several different ways of how to delete parts of our variable.

In all our examples we’ll focus on a pretty simple use case to remove the file extension of a filename.

2.1. Extracting Characters Using a Given Position and Length

We’ll start by seeing how to extract a substring of a particular length using a given starting position:

my_filename="interesting-text-file.txt"
echo ${my_filename:0:21}

This gives us the output:

interesting-text-file

In this example, we’re extracting a string from the my_filename variable. Starting at position 0 and with a length of 21 characters. In effect, we’re saying remove all the text after position 21 which in this case is the .txt extension.

Although this solution works there are some obvious downsides:

Not all the filenames will have the same length
We’d need to calculate where the file extension starts to make this a more dynamic solution
To the naked eye, it isn’t very intuitive what the code is actually doing

In the next example, we’ll see a more elegant solution.

2.2. Deleting the Shortest Match

Now we’re going to see how we can delete the shortest substring match from the back of our variable:

echo ${my_filename%.*}

Let’s explain in more detail what we’re doing in the above example:

We use the ‘%’ character which has a special meaning and strips from the back of the string
Then we use ‘.*’ to match the substring that starts with a dot
We then execute the echo command to output the result of this substring manipulation

Again we delete the substring ‘.txt’ resulting in the output:

interesting-text-file

2.3. Deleting the Longest Match

Likewise, we can also delete the longest substring match from our filename. Let’s now imagine we have a slightly more complicated scenario where are filename has more than one extension:

complicated_filename="hello-world.tar.gz"
echo ${complicated_filename%%.*}

In this variation ‘%%.‘ strips the longest match for ‘.‘ from the back of our complicated_filename variable. This simply matches “.tar.gz” resulting in:

hello-world

2.4. Using Find and Replace

In this final string manipulation example, we’ll see how to use the built-in find and replace capabilities of Bash:

echo ${my_filename/.*/}

In order to understand this example, let’s first understand the syntax of substring replacement:

${string/substring/replacement}

Now to put this into context we are replacing the first match of ‘.‘ in the my_filename variable and replacing it with an empty string*. In this case, we again remove the extension.

**3. Using the sed Command**

In this penultimate section, we’ll see how we can use the sed command. The sed command is a powerful stream editor which we can use to perform basic and complex text transformations.

Using this command, we can find a pattern and replace it with another pattern. When the replace placeholder is left empty, the pattern gets deleted.

As per our other example, we’ll simply echo the name of our file after we have removed the extension:

echo 'interesting-text-file.txt' | sed 's/.txt*//'

In this example, instead of assigning our filename to a variable, we start by piping it into the sed command.

The pattern we search for is ‘.txt‘ and as the replace part of the command is left empty it gets removed from the filename*. Again the result is to simply echo the value of the filename without the extension.

**4. Using the cut Command**

In this final example, we’ll explore the cut command. As the name suggests we can use the cut command for cutting out sections from text:

echo 'interesting-text-file.txt' | cut -f1 -d"."

Let’s take a look at the command in more detail to understand it properly:

We first use the -f option to specify the field number which indicates the field to extract
The -d option is used to specify the field separator or delimiter, in this example a ‘.’

Output fields are separated by a single occurrence of the field delimiter character. This means in our example we end up with two fields split by the dot. Consequently, we select the first one and in the process discarding the ‘.txt’ extension.

5. Conclusion

In this quick tutorial, we’ve described a few ways that help us remove text from a string.

First, we explored native string manipulation using parameter expansion. Later, we saw an example with the power stream editing command sed. Then, we showed how we could achieve similar results using the cut command.

As always, the full source code of the article is available over on GitHub.

getdocs

3043