Using getdelta to reduce size of distfiles download
Yesterday, Stefan Schweizer (a developer who I work with) brought a very cool thing to my attention: getdelta.
getdelta is based on deltup which is a way of storing and applying differences between files. Kind of like diff and patch, but designed for binary files such as compressed tarballs.
deltup is very useful in the context of upgrading distfiles, because typically very little changes between foobar-0.1 and foobar-0.2, so a deltup diff file which could upgrade foobar-0.1.tar.bz2 to foobar-0.2.tar.bz2 would be a much smaller download than downloading the entire foobar-0.2.tar.bz2 file.
The magic of getdelta is that it integrates into portage for downloading your distfiles. As an example, I have subversion-1.1.1.tar.bz2 present in /usr/portage/distfiles, but I now want to upgrade to version 1.1.3.
dsd ~ # emerge -f subversion
Calculating dependencies …done!
>>> emerge (1 of 1) dev-util/subversion-1.1.3-r1 to /
>>> Downloading http://gentoo.blueyonder.co.uk/distfiles/subversion-1.1.3.tar.bz2
Searching for a previously downloaded file in /usr/portage/distfiles
We have following candidates to choose from
The best of all is … subversion-1.1.1.tar.bz2
Checking if this file is OK.
Trying to download subversion-1.1.1.tar.bz2-subversion-1.1.3.tar.bz2.dtu
[...snip the wget download verbosity...]
17:41:57 (490.22 KB/s) – `subversion-1.1.1.tar.bz2-subversion-1.1.3.tar.bz2.dtu’ saved 
Successfully fetched the dtu-file – let’s build subversion-1.1.3.tar.bz2…
subversion-1.1.1.tar.bz2 -> subversion-1.1.3.tar.bz2: OK
This dtu-file saved 5 MB (87%) download size.
>>> subversion-1.1.3.tar.bz2 size ;-)
>>> subversion-1.1.3.tar.bz2 MD5 ;-)
>>> md5 src_uri ;-) subversion-1.1.3.tar.bz2
The above process basically found that I had got a previous subversion tarball already downloaded, so it just downloaded the upgrade deltup patch from the getdelta server, which saved 87% of my download for subversion-1.1.3. You’ll notice that portage does its usual md5 checking independant of the deltup process so I don’t see any possible problems relating to binaries being hijacked.
I did a lot of package upgrades yesterday and for every one where I had an older distfile present and I was watching, getdelta typically saved me 70-95% of the downloading, and only once refused to download the deltup patch because it was bigger than the tarball it was going to construct.
The only disadvantage of this is that constructing the new file can sometimes be time consuming, especially for the big files. Here’s an example where I construct linux-22.214.171.124.tar.bz2 (~35mb) from the 2.6.10 distribution:
linux-2.6.10.tar.bz2 -> linux-126.96.36.199.tar.bz2: OK
Is this any quicker than downloading the whole thing? Perhaps not while I’m staying at uni, where I can get 500kb/sec from UK mirrors, but at home where I’m on standard/unreliable broadband, it definately would be. If you are a dialup user, this is a godsend.
It seems that there are plans to at least trial this as an official Gentoo service, but this is a very nice preview of what might be coming up. To get started, just
emerge getdelta and follow the basic instruction to set a new FETCHCOMMAND in make.conf