Sunday, 13 January 2013

for and while constructs in bash

The for construct

When you want to iterate over a list in bash, the first thing that comes to mind is to use a for loop, like this:

for f in "abc def"; do
    echo $f
done

Simple for loop

This works great when the list to iterate over is short and is composed of items that do not contain any white space. When they do, or the list is long, this construct will get into trouble. Let's demonstrate with a simple example. If I create a file with one item per line, 3 lines like this:

one
two
third line

Simple file called test

Then the first attempt at using a for loop would be:

for f in $(cat test); do
    echo $f
done

Simple for loop to read the file

The result is not quite what was expected:

one
two
third
line

Output of the for loop

You can put double quote in different places, this will not solve the problem. This is because the for construct splits items against white space and as far as it's concerned, an actual space character or a carriage return are the same and count as separators. Another limitation of the for construct is that the sub-command contained in $(...) needs to be fully executed before for can even start. If the output is large, it can run out of memory or just take a long time to get started.

The while construct

Fortunately, bash has another construct that can bypass those limitations, the while construct. It works slightly differently and needs the help of the read command.

cat test | while read f; do
    echo $f
done

A simple while example

And the result is:

one
two
third line

Output of the while loop

This works because the read command reads a full line and does not split on white space. Therefore the value that f is set to is a complete line in the file. The other advantage is that the pipe actually streams the output of the cat command to while and read, meaning that there is no need to wait until it's finished to handle its output. One typical use of that construct is when using the find command: with modern operating systems, file names can have spaces in them and even with a tight condition, find can return hundreds of lines of output.

Use the right tool for the job

So when should you use which construct?

  • If you are dealing with a list that can be large or where each item can contain space characters, use while;
  • If you are dealing with a short list where no item can contain a space character, you can use for.