Conference Talk Playlists for C++ & Game Developers

I’ve put together a number of conference talk playlists for my own “professional development.” This is a list curated by going through hundreds of talks in the GDC and CppCon archives.

I’ve tried to keep overlap between lists to a minimum. I’ve not watched everything (erm… obviously!), but in cases where I have watched a talk, I’ve only left it on the list if I actually got something out of it.

If you have your own playlists to share, by all means, drop a comment below!


Git Cheat Sheet

These are a few of the Git commands I find myself looking up all the time. (Feel free to drop your own suggestions in the comments!)

  • Show the staged changes:
    $ git diff --cached

    Also aliased as:

    $ git diff --staged
  • Find the most recent common ancestor of two Git branches:
    $ git merge-base [branch1] [branch2]
  • Find which branch a commit was originally created on:
    $ git reflog show --all | grep [commit SHA]
  • Find all branches that a commit is on (or that a branch has been merged into):
    $ git branch -a --contains [commit SHA or branch name]
  • Get the diff between two branches:
    $ git diff [branch1]..[branch2]
  • Undo/remove a Git commit that has not been pushed (scary fine print):
    $ git reset --hard HEAD^


  • Extracting a Git subdirectory into its own submodule

Esoteric Data Structures and Where to Find Them

This is a summary of the CppCon 2017 talk given by Allan Deutsch.

Slot map

  • An unordered, associative container similar to hash map
  • Assigns items a unique identifier (unlike a hash map, where you provide an identifier)
  • Similar to a pool allocator, but with a map-like interface
  • Advantages over hash map:
    • True constant time lookup, erase, and insert (except when reallocating to grow the size)
    • Contiguous storage (for improved traversal/cache lines)
    • Reuses memory—if you delete a bunch of items, then insert new ones, it doesn’t result in a bunch of new memory allocations
    • Avoids the ABA problem, i.e.,
      1. You insert an item and receive a key
      2. You delete the item
      3. You insert a new item with the same key
      4. You attempt to access the item pointed to by  the old key, but get the new element
  • Disadvantages compared to hash maps:
    • More memory use, since it dynamically allocates blocks like a vector
    • Because it uses contiguous storage, memory addresses of elements are unstable, and lookup is slower than a raw pointer
    • Requires a small amount of extra memory
  • Potential use cases:
    • Storing game entities
    • Any place you’d like a map-like interface, but with constant performance and support for as-fast-as-possible iteration
  • How it works:
    • Array of “slots” (keys we give the user) which indicate each item’s index and the “generation” (the number of times this slot has been used)
    • Array of data, which gets pointed to by the slots
    • Free list head & tail, indicating which slots should be filled next
  • Variants
    • No separate index table
      • Pros:
        • Stable indices/memory addresses
        • Lookups only requires 1 indirection, not 2
      • Cons:
        • Slower iteration (since elements aren’t densely packed)
    • Constant size array
      • Pros:
        • No reallocations (so insert is always constant time)
          • Also means no memory usage spikes (where memory usage temporarily doubles as we allocate a new block and transfer the old one over)
        • Generations increase roughly uniformly
      • Cons:
        • Dynamic sizing is important for most use cases
    • Block allocation
      • Pros:
        • Constant inserts (like the constant-size array)
        • Spikes in memory due to reallocations are smaller
        • Iteration speeds are similar to original
      • Cons:
        • Elements aren’t fully contiguous, so iteration will always be some degree slower
        • More cache misses (scales inversely with block size—bigger blocks mitigate this!)
        • Adds a third indirection in lookup

Bloom filters

  • A probabilistic, super-duper memory efficient hash “set,” used primarily for short-circuiting otherwise expensive computations
  • Find results are only “definitely not” and “maybe”
  • Supports find and insert operations only—no actual data elements is stored (since, as the NSA informed us, metadata is not real data! 😉 ), so there’s such thing as a “get” op
    • This is especially useful for privacy purposes
  • How it works
    • On insert, take K (fast) hash functions, each hash function sets one bit in the Bloom filter’s bit field
    • On find, hash the object and see if each of the bits you get from your hash functions are set; if they are, the item in question was probably inserted in the past
    • Increasing the number of hash functions decreases your false positive probability (but increases the memory usage)

Navigation meshes

  • Representation of a search space (used for pathfinding) that reduces the number of nodes necessary to represent the traversable area
  • Grid representations suck: They don’t map well onto non-square shapes, and you typically use a lot of them
  • Triangle nav meshes improve upon this; using a mix of triangles and squares can improve it even further
  • Consider:Screen Shot 2017-11-02 at 4.06.26 PM.png
  • To actually traverse this, you would move from edge to edge to get between nodes, and when you reach your destination node, you could simply beeline to your goal (since you know it will be free of obstacles)
  • You can use Recast to create nav meshes, and A* or the like to search them

Hash pointers

  • A combination of a pointer and a hash of the thing being pointed to
  • Allows you to verify that the data hasn’t been changed since you got the pointer to it
  • Most common implementation is in Merkle trees in cryptocurrency blockchains
  • Tamper detection: suppose the data in a block has changed maliciously in a blockchain (a series of hash pointers + data). In order for the attacker to “get away with it,” they’ll have to change the data in Block X, plus the hash in Block Y that points to Block X. But, Block Z stored a hashpointer to Block X as well, and its hash included the now-altered hash in Block X, so the attacker has to also change the hash of Block Z (and so on up the chain).
    • Verification is O(N), for N number of blocks
  • Alternative: Merkle trees
    • Tree structure where only the leaves hold data, and all other nodes simply store 2 hash pointers to 2 children
    • Verification of the structure takes O(log(N)) time, but that also means it takes less tampering on the part of an attacker

Unicode Characters in App Store Descriptions

In the last few years, Apple has cracked down on the special characters they allow in App Store descriptions. This is a list of all Unicode characters I’m aware of that they allow as of July 2017. (I suggest copying & pasting these into your own app descriptions.)

• (bullet)

‣ (small triangular bullet)

◂ (black left-pointing triangle)

▶︎ (black right-pointing triangle)

◀︎ (black left-pointing triangle)

◆ (black diamond)

√ (square root symbol)

If you’re smart, you’ll use these to improve the readability (especially the scannability) of your copy. Since Apple doesn’t allow things like bold, italics, headings, etc., you can make due with some combination of symbols.

Speeding Up SVN Checkout for Large Repositories

After moving our gigantic SVN repo to a new server, we wanted to speed it up. Note that some of these recommendations are peculiar to using the svn+ssh:// protocol. If you’re serving SVN via Apache or something, you might need very different advice.

Here are all the things we changed on the server to speed up SVN. Note that these are in no particular order… it’s hard to say what will give you the biggest bang for your buck. That said, all of these are pretty cheap gags, so if checkout time is a priority, you might as well try them all!

  1. Ensure you’re running the latest version of Subversion. At the time of this writing, that means v1.9, which offers loads of performance improvements over v1.6 that we were running!
  2. Ensure you’re only checking out the files you really need. If you can get by checking out only some directories in the repo, rather than the whole thing, you might consider doing so. (For our usage—we only store art assets in SVN—this is just fine, because there are no real dependencies between subdirectories.)
  3. Set up a cron job to periodically run svnadmin pack on your repository as a way of reducing fragmentation of your repository’s shards.
  4. Upgrade to hosting the repo on an SSD. We found we were I/O bound by our spinning rust hard drives.
  5. Ensure your uplink speed is reasonable. Doing all of the above, we were saturating our old-as-hell 10 Mbit uplink (pushing 1.25 MB/sec isn’t hard off an SSD!).
  6. Try disabling compression (we do so in our svnserve wrapper script, seen below). By default, SVN uses compression level 5 (on a scale from 0 to 9). If you primarily store binary files that don’t benefit from compression, or you have a fast connection, this might be a win. In our case, our server’s CPU was pegged at 100% during a checkout; dropping compression removed the CPU as a bottleneck.
  7. If you’re using the svn+ssh:// protocol:
    1. Ensure you’re running the latest versions of OpenSSH and OpenSSL. Aside from being security holes, old versions have serious performance issues. (I’m ashamed to say our SVN server was running a 6 year old version of both tools! This is what we get for not having anyone “own” the maintenance on this server.)
    2. In your sshd configuration file (for our Ubuntu installation, this was located at /etc/ssh/sshd_config), disallow “old and busted” ciphers. If your server defaults to 3DES, it’s another security risk plus performance disaster. (It’s quite a bit slower than AES, which typically benefits from hardware acceleration.) You should have a line in the file that looks like this:
      Don’t forget to restart your SSH server after making the change!
  8. If you’re using the svn:// protocol: Bump the size of SVN’s in-memory cache. It defaults to something like 16 MB, but if you’re running a dedicated server on halfway reasonable hardware, you can allocate way more than that. To do this, you’ll have to be using an svnserve wrapper. That is, you’ll have to have a custom shell script—something like /usr/local/bin/svnserve—that modifies the arguments that ssh+svn-connected users pass to svnserve. Something like this:
    # Set the umask so files are group-writable
    umask 002
    # Call the 'real' svnserve, also passing in the default repo location
    # Use a 4096 MB in-memory cache size, allowing both deltas and full texts of commits to be cached
    exec /usr/bin/svnserve "$@" --compression 0 --memory-cache-size=4096 --cache-txdeltas yes --cache-fulltexts yes
  9. If all of the above isn’t enough, it may be because you have a large number of files in your directories. (Modern OSes get less efficient as you get to an extremely large number of files.) You might consider re-sharding your repo, but note that this will take a long time (I’ve heard to expect 1 hour per GB to dump, then another hour per GB to reload).

After making the above changes (except re-sharding) on our svn+ssh:// server, we were able to go from an average download speed of about 100 kilobytes/second to about 6 megabytes/secon—not bad at all!

Migrating a Large SVN Repository to a New Server

We ran into a situation at work where we needed to move our SVN repo from an old Linux server (running Ubuntu 10.04, in 2017!!) to a shiny new cloud instance.

Jump to the TL;DR

Dear God, don’t do this

The standard advice you see on the web is to use svn dump on the old server, transfer the dump file, then use svn load on the new server. That works fine for small repos, where the total transfer time will be negligible no matter how you do it, but for a large repo, it’s a disaster. Time estimates I’ve seen for this around the web say it’ll take roughly 1 hour per GB to dump, then another hour per GB to load… not to mention the fact that the dump files are larger overall than the raw filesystem data, so the transfer itself is slower!

Using rsync

Not wanting to spend ~150 hours on this for our 70 GB repo, I wanted to try moving just the raw filesystem data. This StackOverflow answer indicated such a thing could work using scp, but of course there are two scary things that could happen:

  1. What happens if you have a network connection hiccup mid-transfer? (Nobody wants to start over!)
  2. What happens if somebody adds a new commit while you’re working? (You could, at an organizational level, ask for a “lock” for the hours you need to do the transfer, but it’d be nice not to impede everyone’s work for that long.)

The solution, of course, is to use rsync! You run it once to transfer your directory initially, then obtain a lock; then you run it a second time to pick up any changes you missed from the first (long) transfer. Then it’s just a matter of upgrading the repo and preparing it for use.

So, from beginning to end, the complete steps are:

  1. On your old server:
    $ rsync -aPz /path/to/svn-repo/ username@newServer:/destination/path/
  2. Email your team to get a “lock” (no more committing to the old server!)
  3. Once more:
    $ rsync -aPz /path/to/svn-repo/ username@newServer:/destination/path/
    (picking up any changes that were made during the first copy)
  4. On the new server:
    $ svnadmin upgrade /destination/path/
    $ svnadmin verify /destination/path/