Tips and tricks for the Unix shell environment. Shell examples assume a non-csh-based shell, such as ZSH.
Books
- Learning the bash Shell. Good introduction to using and scripting bash. I use zsh, as various bash bugs annoy me more than anything in zsh does.
- Portable Shell Programming. Great book, covers the many pitfalls of shell portability, along with advanced shell scripting.
- The UNIX Philosophy.
- The Unix Programming Environment.
Nifty Third-Party Utilities
- moreutils - collection of various interesting or downright handy tools.
- pv - a terminal-based tool for monitoring the progress of data through a pipeline.
- ShUnit or ShUnit2 provide unit tests for shell code.
Articles
- allsh - run commands in all running shells.
- Avoid use of the /tmp directory.
- Background Shell Command Tips.
- Debugging shell environments.
- Exporting Environment Variables.
- Portability Tips.
- Redirect output on the fly with exec.
- Shell Loop Interaction with ssh - note that the shell while and for builtins have a number of edge cases that may omit certain lines, or automagically create new records to loop over. Use them with caution!
- Subshells - tips on using shell subshells.
- Useless use of cat.
- Useless use of kill -9.
- Using the shell on Mac OS X.
Tips & Tricks
- Order of Operation
- Changing Directories
- Redirection.
- Redirect Testing
- Pipe to command arguments.
- Nesting Backticks
- Stubborn Files
- Qualify the path.
- Find inode and delete by that.
- Large numbers of files.
- Looping without for or while.
- Fun with SSH
- Clobbering a File
- Grepping for processes
- Pass input to command
- Filenames with spaces
- Null Values
Knowing the order of operation in a shell is critical. For example, what does the following print? Assume that the abc environment variable has not been set prior to this line.
$ env abc=def echo $abc
zshexpn(1) details the order of operations for ZSH. In particular, note when $abc is expanded: before or after the shell tries to execute env?
Always check the error status of chdir, to avoid running commands in the wrong directory. Alternatively, use fully qualified paths to obviate the need for chdir(2).
#!/bin/sh
cd $nosuchdir || exit 1
rsync elsewhere:/foo .
Without || exit 1 to abort the script should cd fail, the subsequent rsync command could move files to the wrong location or fill up the wrong partition.
Redirection can take place (almost) anywhere, not just at the end.
$ echo a b >c
$ echo >c a b
$ >c echo a b
Placing the filename at the beginning allows easier editing of the search term at the end of the command.
$ </var/log/messages grep foo
$ </var/log/messages grep bar
$ </var/log/messages grep user1
Shell redirects are baffling—sometimes the 2>&1 goes after the other redirect character, sometimes before. I test whatever it is I am trying to do against the following script, and fiddle with the redirects until the output is handled properly:
#!/bin/sh
# errnout - print to STDOUT and STDERR for testing
echo out
echo >&2 err
$ errnout
out
err
$ errnout >/dev/null 2>&1
$ errnout 2>&1 >/dev/null | grep .
err
I use xargs(1) frequently to convert output from something (file, or another program) to arguments to another command. For instance, to commit only modified files in a cvs sandbox where there may be conflicted, new, or other troublesome files mixed in, use the following.
$ cvs up | perl -ne 'print if s/M //' | xargs cvs ci
Depending on the editor, one can use concept above to open certain files for editing, for example, files in a cvs sandbox that have conflicts.
$ cvs up | perl -ne 'print if s/C //' | xargs vi
ex/vi: Vi's standard input and output must be a terminal
$ cvs up | perl -ne 'print if s/C //' | xargs emacs
emacs: standard input is not a tty
$ cvs up | perl -ne 'print if s/C //' | xargs bbedit
The bbedit utility is part of BBEdit for Mac OS X, and avoids terminal issues by sending the files to the BBEdit application. Using emacs in server/client mode may avoid this problem for emacs. The alternative is to use backticks (or $( … ) to avoid problems nesting ``) to make the files available as arguments to the program, instead of feeding them in through xargs.
$ vi `cvs up | perl -ne 'print if s/C //'`
xargs can be chained with other programs. For instance, one may want to find perl scripts containing the text While and do something with them.
$ find . -name '*.pl' \
| xargs fgrep -l While \
| xargs perl -i -ple 's/While/while/g'
xargs will fail or do the wrong thing if passed filenames contain spaces. This is common on filesystems that traditionally have allowed spaces in filenames (Mac OS), or where file trees have been uploaded to Unix from other platforms. If using find/xargs pairs, the spaces-in-filenames problems can be avoided as follows.
$ find . -type f -print0 | xargs -0 echo
Backticks do not nest, so are only suitable for one level:
$ ls -d `echo /tmp`
/tmp
$ ls -d ``echo `echo /tmp``
bquote>
Instead, use the $(…) construct in modern shells:
$ ls -d $(echo $(echo $(echo $(echo /tmp))))
/tmp
$(…) can also replace useless uses of cat: simply write $(< a_file) instead of the wasteful $(cat a_file).
Dealing with files that have odd characters in their names can often be a chore on Unix, as one cannot type in the names in question. One could use a graphical file manager tool, but I find those cumbersome, ill suited to dealing with large numbers of files, and usually not installed on server systems.
To simply delete the bad filenames, there are a few options.
Files that being with a hyphen (such as a file -rf) will be caught by option processing. These can be avoided by either disabling option processing, or prefixing a directory name to the file path.
$ ls
-rf
$ rm -rf
$ ls
-rf
$ rm *
$ ls
-rf
$ rm -- -rf
$ ls
$ touch ./-rf
$ ls
-rf
$ rm ./-rf
$ ls
$
The -- argument only works on systems whose getopt(3) library supports the syntax. On other systems, or for portability, the qualified path option must be used.
Each file on a Unix filesystem has a inode number associated with it; knowing the inode number of the bad file allows us to search for and delete it.
$ ls -i *
615383 foo
$ find . -inum 615383 -exec rm {} \;
If there are large numbers of files with wacky characters in their filenames, something more powerful than the shell is usually required to filter out the files in question. For instance, to list the inode number of files in the current directory with non-printable characters in their names, use perl.
$ ls -i | perl -nle 'print if /[[:^print:]]/' \
| while read inum name; do echo $inum; done
For situations where the mangled filenames are in deep directory trees, or where the mangling is consistent (uploaded filenames from a DNA sequencer come to mind), use File::Find and write a standalone script.
Directories with huge numbers of files will cause rm * to fail, as the wildcard expands to a list the shell cannot cope with. To delete all the files, remove the parent directory. If only deleting by a pattern, use readdir to loop over each file in turn, and apply a match to each filename.
$ rm -rf /the/bad/dir
$ perl -le 'opendir D, shift or die "$!"' \
-e 'while (readdir D) { unlink if -f and m/\.doc/ }' /the/bad/dir
echo foo bar | sed 's/ /\
/g' | xargs -n 1 echo ls
Commands can be run over ssh(1), though how the shell handles more complex commands can cause problems.
client$ ssh example.org hostname
server.example.org
client$ ssh example.org sleep 3 && hostname
client.example.org
The && is handled by the local shell, not the remote server. Quoting can fix the problem.
client$ ssh example.org 'sleep 3 && hostname'
server.example.org
Shell Loop Interaction with ssh talks about problems with ssh and the shell while builtin.
There are several ways one can empty the contents of an existing file without removing and touch(1)-ing the file in question. Using echo(1) is not portable, as some systems do not support the -n flag, such as Digital UNIX without CMD_ENV=bsd set. The use of the shell null operator : is a clever way, and saves typing.
$ cat /dev/null >file
$ echo -n >file
$ : >file
This leads to a smiley operator :> that helps free up disk space:
$ :> bigfile
grep can match itself. To avoid this problem, the regular expression can be altered so grep does not match itself, which is easier than appending | grep -v grep to a command.
$ ps wwo pid,command | grep 'ss[h]'
The regular expression ss[h] cannot match the literal string ssh[h] in the process listing, but will match any process name containing ssh. Another option: use commands such as pgrep(1).
Some utilities are controlled via command interfaces. Full control of command interfaces may require the expect utility, or Expect. Simple needs can be met by printing commands on standard input, then parsing the output. For example, the Mac OS X scutil can be queried for information:
SERVICE_GUID=`<<EOF scutil | awk '/PrimaryService/{print $3}'
open
get State:/Network/Global/IPv4
d.show
EOF`
echo $SERVICE_GUID
If parsing a list of filenames from a file, spaces in filenames may cause shell interpolation problems. This can be worked around by converting a newline between different filenames into the nul character, then passing that list to xargs -0:
$ touch "foo bar"
$ touch "baz zot"
$ (echo foo bar; echo baz zot) | tr '\n' '\0' | xargs -0 file
foo bar: empty
baz zot: empty
Null (\0) may show up in data, though more often is used in xargs -0 or perl -0 … commands as a record seporator. cat may show the character as ^@, though a better tool such as od(1) or xxd(1) should be used to reveal what the data actually is (results from these commands can be looked up in ascii(7)):
$ perl -le 'print "\0"' | cat -v
^@
$ perl -le 'print "\0"' | od -bc
0000000 000 012
\0 \n
0000002
$ perl -le 'print "\0"' | xxd
0000000: 000a ..
Unwanted nulls can be removed for example with perl (but often not older versions of sed(1):
$ perl -le 'print "\0\0blah"' | perl -pe 'tr/\0//d' | od -bc
0000000 142 154 141 150 012
b l a h \n
0000005
$ perl -le 'print "\0\0blah"' | sed 's/..//' | od -bc
0000000 142 154 141 150 012
b l a h \n
0000005
$ perl -le 'print "\0\0blah"' | sed 's/\0\0//' | od -bc
0000000 000 000 142 154 141 150 012
\0 \0 b l a h \n
0000007