NovoSial.org: rsync Tips & Tricks

Tips on using the rsync command. rsync performs incremental filesystem transfers, allowing filesystem duplication or snap shots. Alternatives to rsync on Unix systems include cp -r, pipes between tar commands, or unison.

rsync behaves differently if the source directory has a trailing slash. Study and learn the difference between the following two commands before moving on. Use the -n option to rsync when testing to preview what would happen.

$ rsync -n -av /tmp .
$ rsync -n -av /tmp/ .

The ZSH option, AUTO_REMOVE_SLASH, may cause trouble if the trailing slash is necessary. If so, try disabling the option.

Note that rsync may not suit all use cases; even the latest versions of rsync may consume too much memory or cause too many I/O operations should too many files need to be synchronized in a single run. A parallelizing wrapper around rsync or some other solution may be necessary—on standalone filesystems, FAM, inotify, or kqueue(2) offer notification methods that should be more efficient than rsync sweeping a filesystem tree again and again for the same metadata. Other options include the various commercial and cluster filesystems that can replicate changes between systems.

Permissions & Ownership

Normally, the -a option can be used to perfectly mirror the files. However, if the target filesystem does not support permissions, a different set of options should be used to avoid warnings from rsync. To synchronize data to a USB drive with a FAT filesystem, I use the -rlt options.

#!/bin/sh
RSYNC="rsync --size-only --delete --delete-excluded --exclude-from=~/.rsync/exclude -rlt"

TARGET=$1
$RSYNC ~/Talks $TARGET

mkdir -p $TARGET/backup/repository
$RSYNC ~/share/repository/ $TARGET/backup/repository

A ~/.rsync/exclude can be used to list common file patterns to ignore, for example .DS_Store files on Mac OS X.

$ cat ~/.rsync/exclude
.DS_Store
.FBCLockFolder

Secure Network Transfers

Set the -e option of rsync to use ssh instead of rsh when copying to a remote system. Modern versions of rsync use ssh by default. While ssh is slower than rsh, the data being transfered will be encrypted, and therefore less likely to be maliciously read or altered, or randomly corrupted in transit.

$ rsync -e 'ssh -ax' -avz example.org:/tmp .

If speed is a concern, use a weaker encryption option to ssh.

$ rsync -e 'ssh -ax -c blowfish' -avz example.org:/tmp .

The -ax options to ssh disable Secure Shell (SSH) agent and X11 forwarding, which are not needed by rsync. Also consider setting -o ClearAllForwardings to ssh, to prevent possible automatic port forwards. For more information on options to OpenSSH, peruse ssh(1) and ssh_config(5).

Timeout Option

To avoid stalls, set the --timeout option, though not low enough that rsync times out before it can build the filesystem differences in memory. In rare cases I have seen rsync not exit, so for unattended runs like filesystem snapshots I set the --timeout option to ensure the command will eventually quit.

Backups with `rsync` and SSH

Notes on how to configure periodic rsync runs over SSH. Filesystem duplication or snap shot scripts may set the following up in different ways; the method outlined here mirrors the home directory of the user running the script from a client system to a backup server.

Setup a SSH key pair without password.

A public key without a password allows unattended periodic backups. The public key should be locked down to only allow backups on the system the rsync is done to. These notes are for OpenSSH as of version 3.8.

On the system the rsync backup script is run on (client in these notes), create a SSH keypair.

client$ ssh-keygen -N '' -C backup1 -t rsa -f ~/.ssh/backup

Configure public key on backup server.

On the system the rsync backup script connects to (server in these notes), configure the public key. These notes cover OpenSSH; consult the manual if a different SSH implementation is being used. Details on common problems with OpenSSH public key authentication.

client$ scp ~/.ssh/backup.pub server:.ssh

server$ cd ~/.ssh
server$ cat backup.pub >> authorized_keys
server$ rm backup.pub

For security, the authorized_keys file should be edited to only allow specific commands among other limitations. For more information, see sshd(8) and sshd_config(5). The command limitation to use can be determined by running the rsync with the -e 'ssh -v -v -v' option to see the exact command run on the server. (Or run rsync with the -vv option, instead of specifying the verbose options to ssh.

The following example shows how to restrict a public key in the authorized_keys file to only run the specified command, along with other restrictions on the connection. The limitations must be listed on one line, prior to the lengthy public key data.

command="rsync --server -v --timeout=999 --delete-excluded . backup/client",↵
no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa AAAAB3Nza…

Create the target backup directory on the server.

rsync will not create the target directory ($HOME/backup/client) on the server; the target directory must be manually created.

server$ mkdir -p ~/backup/client

Create backup script dobackup.

Use the example script linked to, and localize it as needed.

To test the script, try prefixing the rsync call with echo to see what would be run, or add the -n option to rsync to see what it would copy.

Configure a ~/.rsync/exclude file to list files not to backup.

Cache files and other transitory data should be skipped. One way is via the --exclude-from=~/.rsync/exclude option, plus suitable file patterns in the exclude file. For more information, consult the EXCLUDE PATTERNS section in the rsync(1) man page.

$ cat ~/.rsync/exclude
.DS_Store
.mozilla/**/Cache
*.o

Run backup script via a crontab(5) entry.

In addition to automatic runs, the script can be run manually from the command line.

client$ crontab -l
@daily $HOME/bin/dobackup

client$ ~/bin/dobackup
…

Multiple rsync runs can accumulate should previous runs fail to exit in time. This can bring a system down. While the --timeout=999 option can help, a better solution is to ensure only a single copy of the rsync script can run. This could involve checking the running processes for a user account and looking for the command name, among other options.

Mac OS X & the Hierarchical File System (HFS)

Mac OS X 10.4 (Tiger) supports the -E option to rsync, which copies extended filesystem attributes.

On previous versions of OS X, compile rsync with the rsync+hfsmode patch. Note that rsync may have trouble with symbolic links (ownerships and permissions) and sockets (perhaps -gHlopqrtx instead of -a).

`sudo` & Directory Permissions

rsync may fail should it lack permissions to change directories. This odd edge case usually occurs when one user runs rsync as another user via sudo, and that second user lacks permission to escape from the working directory of the first user:

$ sudo -u nobody rsync -avz /tmp/asdf /var/tmp
building file list ... pop_dir /tmp : Permission denied
rsync: writefd_unbuffered failed to write 26 bytes: phase "unknown": Broken pipe
rsync error: error in rsync protocol data stream (code 12) at io.c(515)

On the command line, this is mostly an annoyance; in scripts, perhaps more of a concern. The workaround is to change the working directory to a safe directory prior to running rsync:

cd / || exit 1
rsync …

A chdir(2) to / on Unix is a typical step in daemonizing a process, and may also be a good practice for unattended shell scripts, assuming the rest of the shell script uses fully qualified paths, or changes the working directory elsewhere as necessary.

rsync Tips & Tricks