Speeding Up SVN Checkout for Large Repositories

After moving our gigantic SVN repo to a new server, we wanted to speed it up. Note that some of these recommendations are peculiar to using the svn+ssh:// protocol. If you’re serving SVN via Apache or something, you might need very different advice.

Here are all the things we changed on the server to speed up SVN. Note that these are in no particular order… it’s hard to say what will give you the biggest bang for your buck. That said, all of these are pretty cheap gags, so if checkout time is a priority, you might as well try them all!

  1. Ensure you’re running the latest version of Subversion. At the time of this writing, that means v1.9, which offers loads of performance improvements over v1.6 that we were running!
  2. Ensure you’re only checking out the files you really need. If you can get by checking out only some directories in the repo, rather than the whole thing, you might consider doing so. (For our usage—we only store art assets in SVN—this is just fine, because there are no real dependencies between subdirectories.)
  3. Set up a cron job to periodically run svnadmin pack on your repository as a way of reducing fragmentation of your repository’s shards.
  4. Upgrade to hosting the repo on an SSD. We found we were I/O bound by our spinning rust hard drives.
  5. Ensure your uplink speed is reasonable. Doing all of the above, we were saturating our old-as-hell 10 Mbit uplink (pushing 1.25 MB/sec isn’t hard off an SSD!).
  6. Try disabling compression (we do so in our svnserve wrapper script, seen below). By default, SVN uses compression level 5 (on a scale from 0 to 9). If you primarily store binary files that don’t benefit from compression, or you have a fast connection, this might be a win. In our case, our server’s CPU was pegged at 100% during a checkout; dropping compression removed the CPU as a bottleneck.
  7. If you’re using the svn+ssh:// protocol:
    1. Ensure you’re running the latest versions of OpenSSH and OpenSSL. Aside from being security holes, old versions have serious performance issues. (I’m ashamed to say our SVN server was running a 6 year old version of both tools! This is what we get for not having anyone “own” the maintenance on this server.)
    2. In your sshd configuration file (for our Ubuntu installation, this was located at /etc/ssh/sshd_config), disallow “old and busted” ciphers. If your server defaults to 3DES, it’s another security risk plus performance disaster. (It’s quite a bit slower than AES, which typically benefits from hardware acceleration.) You should have a line in the file that looks like this:
      Ciphers aes128-gcm@openssh.com,aes128-ctr,aes256-gcm@openssh.com,aes256-ctr,aes192-ctr
      Don’t forget to restart your SSH server after making the change!
  8. If you’re using the svn:// protocol: Bump the size of SVN’s in-memory cache. It defaults to something like 16 MB, but if you’re running a dedicated server on halfway reasonable hardware, you can allocate way more than that. To do this, you’ll have to be using an svnserve wrapper. That is, you’ll have to have a custom shell script—something like /usr/local/bin/svnserve—that modifies the arguments that ssh+svn-connected users pass to svnserve. Something like this:
    # Set the umask so files are group-writable
    umask 002
    # Call the 'real' svnserve, also passing in the default repo location
    # Use a 4096 MB in-memory cache size, allowing both deltas and full texts of commits to be cached
    exec /usr/bin/svnserve "$@" --compression 0 --memory-cache-size=4096 --cache-txdeltas yes --cache-fulltexts yes
  9. If all of the above isn’t enough, it may be because you have a large number of files in your directories. (Modern OSes get less efficient as you get to an extremely large number of files.) You might consider re-sharding your repo, but note that this will take a long time (I’ve heard to expect 1 hour per GB to dump, then another hour per GB to reload).

After making the above changes (except re-sharding) on our svn+ssh:// server, we were able to go from an average download speed of about 100 kilobytes/second to about 6 megabytes/secon—not bad at all!

Migrating a Large SVN Repository to a New Server

We ran into a situation at work where we needed to move our SVN repo from an old Linux server (running Ubuntu 10.04, in 2017!!) to a shiny new cloud instance.

Jump to the TL;DR

Dear God, don’t do this

The standard advice you see on the web is to use svn dump on the old server, transfer the dump file, then use svn load on the new server. That works fine for small repos, where the total transfer time will be negligible no matter how you do it, but for a large repo, it’s a disaster. Time estimates I’ve seen for this around the web say it’ll take roughly 1 hour per GB to dump, then another hour per GB to load… not to mention the fact that the dump files are larger overall than the raw filesystem data, so the transfer itself is slower!

Using rsync

Not wanting to spend ~150 hours on this for our 70 GB repo, I wanted to try moving just the raw filesystem data. This StackOverflow answer indicated such a thing could work using scp, but of course there are two scary things that could happen:

  1. What happens if you have a network connection hiccup mid-transfer? (Nobody wants to start over!)
  2. What happens if somebody adds a new commit while you’re working? (You could, at an organizational level, ask for a “lock” for the hours you need to do the transfer, but it’d be nice not to impede everyone’s work for that long.)

The solution, of course, is to use rsync! You run it once to transfer your directory initially, then obtain a lock; then you run it a second time to pick up any changes you missed from the first (long) transfer. Then it’s just a matter of upgrading the repo and preparing it for use.

So, from beginning to end, the complete steps are:

  1. On your old server:
    $ rsync -aPz /path/to/svn-repo/ username@newServer:/destination/path/
  2. Email your team to get a “lock” (no more committing to the old server!)
  3. Once more:
    $ rsync -aPz /path/to/svn-repo/ username@newServer:/destination/path/
    (picking up any changes that were made during the first copy)
  4. On the new server:
    $ svnadmin upgrade /destination/path/
    $ svnadmin verify /destination/path/