Geek Thoughts: bash

Showing posts with label bash. Show all posts

Sunday, 13 January 2013

for and while constructs in bash

The `for` construct

When you want to iterate over a list in bash, the first thing that comes to mind is to use a for loop, like this:

for f in "abc def"; do
    echo $f
done

Simple for loop

This works great when the list to iterate over is short and is composed of items that do not contain any white space. When they do, or the list is long, this construct will get into trouble. Let's demonstrate with a simple example. If I create a file with one item per line, 3 lines like this:

one
two
third line

Simple file called test

Then the first attempt at using a for loop would be:

for f in $(cat test); do
    echo $f
done

Simple for loop to read the file

The result is not quite what was expected:

one
two
third
line

Output of the for loop

You can put double quote in different places, this will not solve the problem. This is because the for construct splits items against white space and as far as it's concerned, an actual space character or a carriage return are the same and count as separators. Another limitation of the for construct is that the sub-command contained in $(...) needs to be fully executed before for can even start. If the output is large, it can run out of memory or just take a long time to get started.

The `while` construct

Fortunately, bash has another construct that can bypass those limitations, the while construct. It works slightly differently and needs the help of the read command.

cat test | while read f; do
    echo $f
done

A simple while example

And the result is:

one
two
third line

Output of the while loop

This works because the read command reads a full line and does not split on white space. Therefore the value that f is set to is a complete line in the file. The other advantage is that the pipe actually streams the output of the cat command to while and read, meaning that there is no need to wait until it's finished to handle its output. One typical use of that construct is when using the find command: with modern operating systems, file names can have spaces in them and even with a tight condition, find can return hundreds of lines of output.

Use the right tool for the job

So when should you use which construct?

If you are dealing with a list that can be large or where each item can contain space characters, use while;
If you are dealing with a short list where no item can contain a space character, you can use for.

Tuesday, 19 January 2010

Improving your Linux Skills

I came across a few interesting web sites that are excellent resources if you want to improve your Linux skills:

Tux Training is a generic resource on everything Linux. It's got some very good articles. I may contribute some of my own when I have the time;
If you want to learn the basics of the command line and some basic bash programming, LinuxCommand.org is an excellent resource;
If you are interested in programming, there is a good bash programming introduction at The Linux Documentation Project;
Once you know the basics, TLDP also has an advanced bash programming guide to take you to the guru level.

Saturday, 27 October 2007

Geeky experiments with bash functions

return doesn't mean what you think it does

Modern UNIX shells like bash have the ability to define functions. Functions are a great way to factorise parts of code that you need to use in several areas of your script or isolate discrete pieces of logic. In most programming languages, one fundamental aspect of functions is that they can return a value which is the result of whatever computation they were doing. And indeed a shell like bash has a built in return command. But, hang on, if you read up on return, you realise that it can only return integer values. The reason for this is that return works the same way as exit: it sets the $? variable with the value given as argument, or 0 if no argument is given, and aborts the function. exit aborts the whole script instead. So, if you use return, use it to provide the calling code with an error code. This doesn't solve the original problem though: how can we return a value from a function, such as a character string?

As often with UNIX, the answer is deceptively simple and consistent with everything you know about scripts: just echo the value you want to return and call your function as if it was a full blown script, with inverted quotes or the $(...) construct, as in the example below.

#!/bin/bash

function f {
  echo "[ $1 ]"
  return 1
}

s=`f "abc"`
echo "\$?=$?"
echo "\$s=$s"

Save this in a file called fn.sh, make it executable and run it:

$ ./fn.sh
$?=1
$s=[ abc ]

As you can see, the $? special variable was set with the value 1 and the $s variable was updated with the result of the function.

Recursive fun

Once you know how to return a value from your function, the next thing you need is to know how to pass it some parameters. Once again, it works exactly the same as in a full blown script: you just use the $n positional variables. Recursion works as you expect as well. So let's demonstrate with a classic textbook example: a recursive factorial.

#!/bin/bash

function fact {
  if [ $# -lt 1 ]; then
    return 1
  elif [ $1 -lt 1 ]; then
    return 2
  elif [ $1 -eq 1 ]; then
    r=1
  else
    r=$(( $1 * `fact $(( $1 - 1 ))` ))
  fi
  echo "$r"
}

fact $1

Save it, run it and you should get something like the following. Don't give it too high a value though, we'll see why in a second: 10 should be enough to demonstrate that it works.

$ ./fact.sh 10
3628800

While we're here, let's have a quick look at this function as it has a couple of interesting constructs. It does the following:

check the number of parameters it has been passed, using the $# variable, and returns an error if less than 1,
check that the first parameter is positive, as a negative value is invalid and return an error code in this case,
check the termination condition of the recursion and set the result if we have reached that condition,
finally calculate the factorial by calling itself recursively.

Note the use of the $((...)) construct to do the relevant arithmetic calculations: one is needed inside the recursive call to the function, to tell ensure the value passed is the result if $1 - 1 rather than the three parameters $1, - and 1; another one is needed outside the call to calculate the product.

This script also proves that when using functions in this way, the variables defined inside the function are local and not overwritten by a subsequent call. This is because the use of the back quotes actually forks a new process in which the function is called. You can verify this by adding a sleep statement inside the function, running the script in the background and running ps:

$ ps
  PID  TT  STAT      TIME COMMAND
  394  p1  S      0:00.07 -bash
 2414  p1  S      0:00.01 /bin/bash ./fact.sh 10
 2415  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2416  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2417  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2418  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2419  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2420  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2421  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2422  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2423  p1  S      0:00.00 /bin/bash ./fact.sh 10
 2424  p1  S      0:00.00 sleep 5

Each child process has its own context and variables and doesn't interfere with the other ones. However, this means that you have to be extremely careful when using functions this way as you could quite easily spawn a large number of processes. Recursion in particular could be deadly.

Pipe dreams

Finally, if a function generally works like a script, can we pipe it? yes but if you want it to be on the consuming side of the pipe, you will need to adapt the function to take its input from stdin rather than a parameter. And you can even make it work so that it can do both. Here is a modified version of the very first script:

#!/bin/bash

function f {
  if [ $# -ge 1 ]; then
    echo "[ $1 ]"
  else
    while read line; do
      if [ -n "$line" ]; then
        echo `f "$line"`
      fi
    done
  fi
}

find ~ -type f -print | f

You could apply this construct to most functions: check if there are any parameters, in which case you can use them normally, otherwise read each input line and call the function recursively using the line as parameter. Don't forget to enclose it between quotes though, so that it is passed as a single parameter and blank lines don't trigger an infinite recursion. Run this script and you should get a list of all files in your home directory, with each file enclosed in square brackets.

That's it for functions. Please tell me if any of the examples above don't work for you. I have tested them on Ubuntu Linux, Sun Solaris Express and Mac OS-X so they should be fairly portable but you never know. They may not work with shells other than bash but feel free to experiment.