NovoSial.org: Useless use of kill -9

Issues Terminating Reluctant Processes | Kill Signals | Popular Culture

The -9 (or KILL) argument to kill(1) should never be used on Unix systems, except as a very last resort. Why? The KILL signal cannot be handled by the process. This means blasting away with kill -9 may leave child processes of a parent orphaned, the filesystem littered with temporary files, shared memory segments active, lingering sockets, and any atexit(3) code unexecuted. The result? A system in taters, and an increased risk of unanticipated and hard to debug problems.

Instead, use the TERM signal by default, and only work up to a KILL if less problematic signals prove insufficient:

$ kill 6738
$ kill -INT 6738
$ kill -HUP 6738
$ kill -KILL 6738

If KILL does not cause a process to exit, then the process is likely wedged on I/O or some other uninterruptible state. A system reboot may be required, or perhaps the forced unmount of a misbehaving NFS mount point.

Use of kill -KILL by default may seem acceptable where a known problematic application is involved; old versions of Netscape often required KILL. However, these are exceptions to the rule: only use KILL for these few known problematic applications, and never by default.

Issues Terminating Reluctant Processes

Cycling through a list of kill signals raises several problems: first, that the process might have legitimate reasons for taking seconds, or even tens of seconds, to properly shutdown. One product the author has setup, which used Oracle embedded in Java, required over 30 seconds to cleanly shutdown after the TERM signal was delivered. Luckily, this was discovered in the test environment, allowing a suitable shutdown script to be written. Second, that delivering different signals to a process ID over time is a race condition: the old process could exit, and a new one reuse the process ID, between the TERM and KILL signal. This risk increases on systems with high process turnover, and also where the kernel randomizes process ID allocation—OpenBSD, for example. Checking the process name or parent process ID will not help, as a new child could be forked from the same parent, with the same name, and end up with the same process ID. A very paranoid script may want to consult the process start time, or other metadata, before blasting away, though again the time between the check and the kill is again a race condition. Perhaps rare and low risk race conditions, but how critical are the processes being dealt with? How does the system behave under extreme load?

Kill Signals

Kill signals may be given either by name or by number: kill -1 and kill -HUP are equivalent. However, using the name of the signal is safer, as -1 may be mistyped, resulting in a signal being sent to the wrong process or even a process group, depending on how badly the command is mistyped. Always use the name in scripts, as this better documents the signal being sent for future readers of the code.

The HUP signal will “hang up” shells, and is a good way to clear out shells wedged on standard input, or close SSH sessions.

For more information on kill signals, see kill(1) or run kill -l for a list of signals supported on the system in question. kill(2) details the system call, and notably how kill -0 6738 is handled. For the gory details, read The Design and Implementation of the 4.4 BSD Operating System or UNIX Internals: The New Frontiers.

Popular Culture

Musical Geek Friday #15: Kill -9.