This LWN article is perhaps a bit confused, but it does point out a truth: On modern hardware, with fast, local storage, and large, often static files, rsync is often unncessarily slow.

That's probably true to some extent when using rsync across a LAN, but it's especially true when using it to copy files locally. rsync runs as a client/server pair, and both processes MD5 checksum the files as they are being transferred. That's nice, but it's slow too.

Funny thing about rsync is that probably 50% of uses of it don't involve the core feature that it was written to provide: Updating files by transferring differences. Which, other than ensuring data validity, is the only reason it needs slow checksums. Instead, rsync is often chosen because of all the other awesome features that were glommed onto it over the past several decades.

Compared with rsync, cp is pretty laughable -- it can't even exclude files from being copied by patterns. And unlike rsync, cp command lines are not often developed by repeated trial and error -- cp does not recover well from being ctrl-c'd in the middle, while rsync does. These kinds of things make a lot of us reach for rsync first, even if the situation does not involve incremental file changes. In most any situation, one of rsync's 120-some options is sure to be just what you need...

So lots of scripts use rsync to synchronise directories, but the amount of speedup obtained by using the rsync algorithm is often low. (Correction: rsync turns off the delta-transfer algorythm by default for local to local transfers. It still does md5 checksums however.) Since rsync is reasonably fast, we generally don't care that it does these checksums that probably on average slow it down. But if larger files, like videos, are involved, this starts to change.. When rsync is run on a typical home NAS, with a slow (arm) CPU, the picture changes entirely -- now the checksum overhead is unbearable, while the IO overhead is minimal.

So, here's local-rsync. It takes all the same options as rsync, with the caveat that SOURCE and DEST must be the first two options, and must be local directories.

% local-rsync huge/ /media/usb/huge/ -a -v --exclude '*~' --delete --max-delete=100

It speeds up rsync in these types of situations, by querying it to find what files need to be updated, and updating them the brute force way, with cp. At the end, rsync is run, to take care of the non-brute force stuff (like deletions and file permissions).

On a fast CPU, local-rsync will probably speed up rsync of large, static files by a factor of 2. On a slow CPU, local-rsync is so fast, and rsync so slow, that I have not bothered to benchmark. :)

This is just a hack (easily replaced by a --no-checksum option in rsync, of course). But I think it illustrates some interesting things about how a program's underlying assumptions about its environment can change over time, and how free software programs can accrete value until the original differentiating reason for their existence is not the most important thing about them anymore.