This LWN article is perhaps a bit confused, but it does point out a truth: On modern hardware, with fast, local storage, and large, often static files, rsync is often unncessarily slow.
That's probably true to some extent when using rsync across a LAN, but it's especially true when using it to copy files locally. rsync runs as a client/server pair, and both processes MD5 checksum the files as they are being transferred. That's nice, but it's slow too.
Funny thing about rsync is that probably 50% of uses of it don't involve the core feature that it was written to provide: Updating files by transferring differences. Which, other than ensuring data validity, is the only reason it needs slow checksums. Instead, rsync is often chosen because of all the other awesome features that were glommed onto it over the past several decades.
Compared with rsync, cp
is pretty laughable -- it can't even exclude files
from being copied by patterns. And unlike rsync, cp
command lines are not
often developed by repeated trial and error -- cp
does not recover well
from being ctrl-c'd in the middle, while rsync does. These kinds of things
make a lot of us reach for rsync first, even if the situation does
not involve incremental file changes. In most any situation, one of
rsync's 120-some options is sure to be just what you need...
So lots of scripts use rsync to synchronise directories, but the amount of speedup obtained by using the rsync algorithm is often low. (Correction: rsync turns off the delta-transfer algorythm by default for local to local transfers. It still does md5 checksums however.) Since rsync is reasonably fast, we generally don't care that it does these checksums that probably on average slow it down. But if larger files, like videos, are involved, this starts to change.. When rsync is run on a typical home NAS, with a slow (arm) CPU, the picture changes entirely -- now the checksum overhead is unbearable, while the IO overhead is minimal.
So, here's local-rsync. It takes all the same options as rsync, with the caveat that SOURCE and DEST must be the first two options, and must be local directories.
% local-rsync huge/ /media/usb/huge/ -a -v --exclude '*~' --delete --max-delete=100
It speeds up rsync in these types of situations, by querying it to find what files need to be updated, and updating them the brute force way, with cp. At the end, rsync is run, to take care of the non-brute force stuff (like deletions and file permissions).
On a fast CPU, local-rsync will probably speed up rsync of large, static files by a factor of 2. On a slow CPU, local-rsync is so fast, and rsync so slow, that I have not bothered to benchmark. :)
This is just a hack (easily replaced by a --no-checksum option in rsync, of course). But I think it illustrates some interesting things about how a program's underlying assumptions about its environment can change over time, and how free software programs can accrete value until the original differentiating reason for their existence is not the most important thing about them anymore.
Rsync always checksums. Please see the man page, if you don't believe me:
That checksum is a MD5sum. There is also a second, rolling checksum used by the rsync algorithm. Apparently it does both.
And, there is a third one that has to be enabled with --checksum to better detect if an existing file has been changed. But that one is not really relevant.
Liw: I think -W might be the magic option I was looking for! Hidden amoung the hundred or so other magic options. :)
Madduck: Actually, I've been doing all my testing on a N2100. Although disk writes have been going to a USB disk. Still, rsync with checksumming is much much slower than just blasting the bits.
Regarding -W, the man page says it's the default for local paths. Since rsync is in my experience still cpu-bound on local paths, I think -W must not be disabling all the checksums. Probably rsync is still doing the md5sum that it uses as a whole-file consistency check. -W may disable the rolling checksum only.
A code-dive is in order..
There are a number of problems with your script. For example, it does break --backup. Since you're deleting/overwriting the target file with the brute-force cp first, it will be lost and cannot be backupped anymore. Emulating this is next to impossible. Also, in the --dry-run line, adding slashes to src arguments alters rsync behaviour (these could easily be taken away). Also if the source or target are actually files (as opposed to dirs) the script breaks, while rsync alone does not.
Nice idea, but better don't rely on this script and take a code dive instead :-)
@madduck is there a way to make rsync behave like that? I'm backing up my media library and sometimes only one song's tag changes, it has to checksum all files and as mentioned in the other article read all 80 gigs of my library to do so. On a USB drive through SSH. Yeah. Any ideas?