Scripting is a valuable skill. Once you have the knowledge of building blocks, you can make your computer do custom tasks quickly.

Getting started

Tools

These universal tools are worth familiarizing yourself with. They will be most of the glue you need to combine simple tools to perform complex tasks.

  • bash
  • awk
  • grep
  • sed
  • perl
  • sort
  • find

There is a lot of bad bash advice on the internet. It used to be tricky to distinguish between the good advice and bad advice, but now, there's a tool that identifies most of the common mistakes:

  • Install and use shellcheck. Shellcheck is a static analysis tool that will catch most of your bugs.

When your bash script starts to become less glue and more processing, it's time to rewrite the script in a proper general-purpose language like Python.

Check out the bash script template.

Bash script profiling

read empty fields without collapsing whitespace

read empty fields without collapsing whitespace
printf '1 foo\t\t3 baz\t\n' | {
    IFS=$'\t' read -r foo bar baz rest
    printf 'foo: %q\n' "$foo"
    printf 'bar: %q\n' "$bar"
    printf 'baz: %q\n' "$baz"
    printf 'rest: %q\n' "$rest"
}

Output:

foo: 1\ foo
bar: 3\ baz
baz: ''
rest: ''

Problem: bar should have been empty and baz should have been 3 baz.

Solution: split the tabs into newlines, then read each individual line!

printf '1 foo\t\t3 baz\t\n' | tr '\t' '\n' | {
    IFS= read -r foo
    IFS= read -r bar
    IFS= read -r baz
    IFS= read -r rest
    printf 'foo: %q\n' "$foo"
    printf 'bar: %q\n' "$bar"
    printf 'baz: %q\n' "$baz"
    printf 'rest: %q\n' "$rest"
}

Output:

foo: 1\ foo
bar: ''
baz: 3\ baz
rest: ''
# Normal fixed width columns, no empty column
$ printf 'foo\tbar\tbaz\n1aaa\t2bbb\t3ccc\n' | while IFS=$'\t' read -r a b c; do echo "a: $a"; echo "b: $b"; echo "c: $c"; done
a: foo
b: bar
c: baz
a: 1aaa
b: 2bbb
c: 3ccc

# Normal lines
$ printf 'foo\tbar\tbaz\n1aaa\t2bbb\t3ccc\n' | tr '\t' '\n' | while { IFS= read -r a; IFS= read -r b; IFS= read -r c; }; do echo "a: $a"; echo "b: $b"; echo "c: $c"; done
a: foo
b: bar
c: baz
a: 1aaa
b: 2bbb
c: 3ccc

# Enable reading empty columns: Replace columns with lines, works if every row has an equal number of columns
$ printf 'foo\t\tbaz\n1aaa\t2bbb\t3ccc\n' | tr '\t' '\n' | while { IFS= read -r a; IFS= read -r b; IFS= read -r c; }; do echo "a: $a"; echo "b: $b"; echo "c: $c"; done
a: foo
b:
c: baz
a: 1aaa
b: 2bbb
c: 3ccc

# Remove the last column of the last row: Replace columns with lines, works if every row has an equal number of columns (example where 2nd row is missing a column)
$ printf 'foo\t\tbaz\n1aaa\t2bbb\n' | tr '\t' '\n' | while { IFS= read -r a; IFS= read -r b; IFS= read -r c; }; do echo "a: $a"; echo "b: $b"; echo "c: $
c"; done
a: foo
b:
c: baz

# Support empty columns and variable width rows
$ printf 'foo\t\tbaz\n1aaa\t2bbb\n' | awk -F '\t' -v width=3 '{printf "%s", $0; for(c=NF;c<width;c++){printf "%s", FS}; printf "\n"}' | tr '\t' '\n' | while { IFS= read -r a; IFS= read -r b; IFS= read -r c; }; do echo "a: $a"; echo "b: $b"; echo "c: $c"; done
a: foo
b:
c: baz
a: 1aaa
b: 2bbb
c:

sort by columns out of order

Problem

The sort command can sort by a specific column, and a range of columns, but it can't sort by columns in reverse order or columns that are not next to each other.

Solution

  • Use awk to pick and choose the columns you want and prepend those columns to the lines
  • sort on the first columns of the lines
  • cut those columns off

Example

input.txt

2500 hello world3 3333
2500 hello world2 1111
2000 hello world1 3333
cat input.txt |
    awk -v 'OFS=\t' '{print $4, $1, $0}' |
    sort -k1,2 |
    cut -f3-

Output:

2500 hello world2 1111
2000 hello world1 3333
2500 hello world3 3333

Process in parallel but output in order

Of course, use parallel --keep-order if available, but if it isn't:

Problem

You are running multiple commands in the background, which may produce print output in a non-deterministic order.

Solution

  • Prefix the output lines with line numbers, using nl
  • sort the lines
  • cut the line numbers off

Example

Command with non-deterministic output:

( echo 2; echo 1; echo 4; echo 3; ) | while read -r line; do
  (
    echo "$line before"
    sleep "$line"
    echo "$line after"
  ) &
done | cat

Output:

2 before
1 before
4 before
3 before
1 after
2 after
3 after
4 after

Command adjusted for ordered output:

( echo 2; echo 1; echo 4; echo 3; ) | nl -d '' -n rz -w 6 | while IFS=$'\t' read -r order line; do
  (
    echo "$line before"
    sleep "$line"
    echo "$line after"
  ) | nl -d '' -n rz -w 6 | sed "s/^/$order\t/" &
done | sort -k1,1 | cut -f3-

Output:

2 before
2 after
1 before
1 after
4 before
4 after
3 before
3 after

Zsh gotchas

  • Arrays are 1-indexed in zsh, but 0-indexed in bash and everything else
  • Coprocs in the interactive shell can't be restored if a sourced script or dotfile uses coprocs