NovoSial.org: Linux Cron Tips

The cron(8) daemon on Unix systems offers simple job scheduling. Consult crontab(5) to learn the scheduling syntax. Note that the various cron implementations do not support the same syntax; be sure to find the authoritative documentation for the cron daemon in question. Alternatives to cron include at(1), cfengine, Quartz for Java, Anacron for laptops, or schedulers that operate over clusters of systems, such as SLURM.

Day-of-Month and Day-of-Week Logic

cron suffers from a little known inconsistency of logic. This problem may cause jobs to be run when they should not, which at best will cause unexpected jobs run, and at worst break a promised schedule. Suppose a crontab file contains a job that runs only on weekdays:

18 4,12,20 * * 1-5 /somewhere/something

This runs well, and is forgotten, until a new business requirement that in addition to the weekdays, the job not run on the first day of the month. The logical solution indicates that the day of month should simply be added:

18 4,12,20 1 * 1-5 $HOME/bin/do_something

Except that for whatever reason, cron treats day-of-month and day-of-week, if both are not *, as a logical OR, instead of the AND used everywhere else. This is documented in crontab(5), though is easily overlooked or forgotten.

Mitigation

User education can help mitigate this issue, though as it crops up so infrequently, a new hire could easily reintroduce the problem. A development environment will likely not catch the issue, as business realities will unlikely allow for a month or two of testing prior to deployment, and even if tested, someone would have to note that the job was unexpectedly running when it should not be. This cron feature defies expectations built up by the AND used between every other field, making a tester unlikely to be suspicious of the day-of-month and day-of-week field both being set.

A cron simulator would be interesting: for a given date range and cron jobs, it would emit what jobs run when. This ideally would be built-in to the cron, and not emulated (perhaps imperfectly) by a third-party program. Someone would still need to make predictions on how many runs are expected, and investigate why a job is running more often than expected.

Another solution is to keep the cron schedule deliberately simple, perhaps running every hour of every day. Then, code inside the specific job would check the business logic to determine wether or not to run: what days are acceptable, what times are acceptable, whether any scheduled maintenance is ongoing, system and network load, the time since the last successful run, measures of the work that needs to be done, and so forth.

As a simple workaround, use date math, either on the crontab entry, or at the top of the script, to restrict runs as necessary. Note: cron may treat % specially, using that character for a newline expansion. This is the same character used by typical date(1) and more specifically strftime(3) calls typically used to determine the day-of-week or day-of-month in shell scripts.

Daylight Saving Time

Job scheduling may run afoul the biannual Daylight Saving Time wobbles. This is a concern for cron, as jobs may be run twice, or not at all, depending on when they are run, and which direction the time is wobbling. As an example, a company I once worked for ran a critical billing job at 02:30 on Sunday. This did not run as 02:00 jumped to 03:00 in April, resulting in a high severity incident and messy cleanup, as other subsequent processes assumed the billing cycle had run.

If possible, run all systems and services in Coordinated Universal Time (UTC), and only make localtime(3) calls where necessary. Changing existing systems from a local time zone to UTC is not easy, and is usually best addressed when rolling out new systems and services. Unfortunately, many companies start out small, and pick the local time zone for their systems.

Note: if users set a custom TZ environment variable, services restarted may use the wrong time zone. Ensure that the service startup scripts set TZ=UTC, among other environment sanity checks. Also employ sudo -H … when restarting services, as this reduces the odds a user environment setting will be improperly inherited by the service.

If systems must run in a time zone that wobbles, audit the crontab schedules biannually for critical jobs that may not run Sunday between 02:00 and 03:00, or jobs that may run twice during 01:00, depending on which direction the time wobbles. This documentation has assumed United States time zones: other time zones or localities may or may not use different times or days to wobble the time, or Congress may change the rules yet again, resulting in even more problems.

Also, consider documenting the crontab file with leading comments that warn of the biannual timezone wobble:

# NOTE: This crontab runs on a system that uses a local timezone. Due to
# Daylight Saving Time, cron jobs on Sunday must not be scheduled
# between 01:00 and 03:00, unless measures are taken to ensure
# these jobs can handle running twice (during 1 AM) or not at all
# (during 2 AM), depending on how the timezone wobbles.

As another option, schedule a job more times than necessary, and cache the last successful run time. The script can then check the last run time, and only run again should the previous run be far enough in the past. This solves the problem of timezone wobbles skipping an hour, and also missed jobs due to reboots or other transitory connectivity issues.

Shorthand Notation

The @reboot syntax is the most useful of the special @… shorthands for “hourly” or “daily” runs: a simple way to launch processes at system startup time, especially for users who lack access to install init.d scripts. For example, a user may want ssh-agent run at startup for OpenSSH public key authentication support:

@reboot ssh-agent -s | grep -v echo > $HOME/.ssh-agent

I avoid @hourly and similar time-based shorthand notations, as these run at the top of the hour. This is usually when remote systems or partners run their own jobs, or when database roll partitions, or when scheduled changes typically start. Jobs at the top of the hour may at best result in network traffic spikes, or at worst production issues as conflicts or unexpected race conditions emerge. Instead, schedule jobs at eight minutes past or some other semi-random time during the hour, and be mindful of what else is running near that time.

Runtime Randomization

Certain jobs, especially if executed on hundreds or thousands of systems, will need to randomize when they run. Randomization avoids network traffic congestion or service brownouts by spreading when the work begins over a period of time. The randomization can either be done inside the script being run, or by inserting a script that delays for a random number of seconds prior to the actual command run in the crontab file.

Delay inside a script:

#!/usr/bin/perl -w
use strict;

my $RUNTIME_DELAY = 60;

# Do not delay if STDIN attached
if (!-t) {
sleep int(rand($RUNTIME_DELAY)+1);
}

…

Delay in the crontab file via a custom randsleep script. This is a better option, as the fact that a delay is being used, and the duration of the delay, are obvious in the crontab:

17 * * * * /path/to/randsleep 60s && …

@reboot jobs may also need runtime randomization, as a large number of systems rebooting at the same time could then generate excess activity as they spin up. This can also be solved by staggering system startup times.

Unique Jobs

Certain jobs may need to run exclusively. That is, no other version of the job should run at the same time. Candidates include jobs that could take a long time or those that require exclusive access to some resource. rsync jobs can be a particular problem: should rsync not exit before the next rsync is run by cron, the system can enter a downward spiral of resource use. Perl on Unix can take advantage of the __DATA__ filehandle lock trick, among other locking options. See also File Locking Tricks and Traps for more information. Similar methods cann be devised for other languages.

Another alternative is a simple daemon that launches rsync a set period—say five minutes—after the previous rsync exits. Assuming only one such daemon is running, two rsync will never be run at the same time:

#!/bin/sh

while sleep 300; do
rsync --timeout=999 …
done

Missed Jobs

cron jobs may not run, due to a system outage, or laptop being asleep. Solutions include Anacron, or the more flexible scheduling of cfengine. Another case is where the cron job runs, but fails due to the network or some other resource not being available. Transitory errors should not result in permanent problems, and software should be intelligent enough to retry in the face of errors.

One solution is to schedule more jobs than necessary, and use a cache or database to track when the last successful run was. If the last successful run was within a certain time period, the extra cron jobs would then do nothing. Another concern with missed jobs is whether the script expected to only be run at that time—perhaps due to SQL that queries based on the current time—and therefore whether a catchup run will miss anything. For example, a script that pulls the last 24 hours worth of data from a database fails. Six hours later, someone manually runs the job. However, six hours of data are missing, and the run reflects data from N-18 hours to N+6 hours, not the usual N-24 to N in other reports. Avoid this condition by querying for specific time periods, or by again tracking in a cache or database what the reporting script has already seen, so the script knows to resume from where it left off automatically.

Missing Cron E-mail

cron sends e-mail to the system sendmail interface. Delve through the mail server logs if hunting down a missing cron e-mail. Sendmail logs via syslog(3); other Mail Transport Agent (MTA) may log to different locations. Syslog messages may be handled by syslogd(8)—try grep mail /etc/syslog.conf to see where the system hides the mail facility logs—or a replacement daemon such as syslog-ng. cron(8) usually reacts to failures from sendmail (cannot fork, error sending message, and so forth) by attempting to log the error to a log file (if any) or also to syslog(3) (probably under a cron facility). Sendmail in some cases may try to write a dead.letter, but can be thwarted from even writing that by permissions problems or other misconfigurations.

If the MAILTO environment variable is set to the empty string, no e-mail will be sent (nor if there is any output from the script). As an alternative (or addition) to e-mail, consider sending job status information to syslog(3) via logger(1), or to a database. I tend not to favor e-mail from cron, as it leads to cron spam.

Cron Spam

Cron jobs should control their output so that needless cron e-mail is not generated. Otherwise, companies end up with hundreds of needless cron e-mail being auto-deleted or auto-filtered into unread archive mailboxes, as nobody has the time to correct the cron spam. Instead, standard out and error can be closed or redirected to a logfile, or all logs sent to syslog. Errors can then be detected by log scanning or other mechanisms, not by assuming someone will read the cron spam and notice a problem.

If the cron job must generate e-mail, use code in the script to send the e-mail, instead of relying on the default e-mail target of cron. In Perl, the MIME::Lite module can generate e-mail. In a shell script, pipe to mail, perhaps using a subshell to capture the output of multiple commands:

(
echo …
someothercommand …
) | mail …

Though this will also require error checking on the mail call, and perhaps other sanity steps, depending on the importance of the message. I treat e-mail from cron as a problem: either their is something to fix, as a critical error or unusual condition has been reported, or something that is too verbose (and therefore requires correction). If there are many critical errors, then other solutions must be brought to bear.

`/etc/cron.d`

Some versions of cron support reading custom crontab entries from under the /etc/cron.d directory. This can baffle users, who cannot see where the cron job is running from—crontab -l omits listing any jobs under /etc/cron.d—but in the long run is a better method to manage cron jobs specific to different services running on the system. For example, both Apache and a suite of reporting tools could run under the www user account. Using crontab -e, the two different environments would have to agree on how to lock and edit that data. Instead, using /etc/cron.d, the two environments can instead write out different files with their cron jobs.

When installing cron jobs to /etc/cron.d, be sure to use atomic file operations, and touch the /etc/crontab file to ensure cron reads the new data:

#!/usr/bin/perl
use strict;
use warnings;
use File::AtomicWrite ();

my $target = '/etc/cron.d/a_unique_name_we_hope';

eval {
# TODO handle signals
my $aw = File::AtomicWrite->new( { file => $target } );
my $fh = $aw->fh;

# TODO write data into $fh …
};
if ($@) {
die "error: could not write $target: $@";
}

# Notify cron of updates
my $current_epoch = time();
utime( $current_epoch, $current_epoch, '/etc/crontab' );

Backups

System and service crontabs should not be backed up; instead, they should be reproducible using configuration management software. User crontabs are another matter, as configuration management is perhaps too much overhead for a user. Worse, the user crontab file is not stored under the user home directory, so backups or migrations to new hosts may result in lost crontab data. One option: periodically backup the crontab data to the home directory:

# breaking my own rule about @daily and other timestamp uses...
@daily crontab -l > ~/.cron.`hostname`

`kronsoon`

The kronsoon script generates run-probably-only-once crontab entries for the near future. This is a good way to schedule a job to run soon, and only once, if experimenting or rescheduling a missed job.