Esoteric Data Structures and Where to Find Them

This is a summary of the CppCon 2017 talk given by Allan Deutsch.

Slot map

  • An unordered, associative container similar to hash map
  • Assigns items a unique identifier (unlike a hash map, where you provide an identifier)
  • Similar to a pool allocator, but with a map-like interface
  • Advantages over hash map:
    • True constant time lookup, erase, and insert (except when reallocating to grow the size)
    • Contiguous storage (for improved traversal/cache lines)
    • Reuses memory—if you delete a bunch of items, then insert new ones, it doesn’t result in a bunch of new memory allocations
    • Avoids the ABA problem, i.e.,
      1. You insert an item and receive a key
      2. You delete the item
      3. You insert a new item with the same key
      4. You attempt to access the item pointed to by  the old key, but get the new element
  • Disadvantages compared to hash maps:
    • More memory use, since it dynamically allocates blocks like a vector
    • Because it uses contiguous storage, memory addresses of elements are unstable, and lookup is slower than a raw pointer
    • Requires a small amount of extra memory
  • Potential use cases:
    • Storing game entities
    • Any place you’d like a map-like interface, but with constant performance and support for as-fast-as-possible iteration
  • How it works:
    • Array of “slots” (keys we give the user) which indicate each item’s index and the “generation” (the number of times this slot has been used)
    • Array of data, which gets pointed to by the slots
    • Free list head & tail, indicating which slots should be filled next
  • Variants
    • No separate index table
      • Pros:
        • Stable indices/memory addresses
        • Lookups only requires 1 indirection, not 2
      • Cons:
        • Slower iteration (since elements aren’t densely packed)
    • Constant size array
      • Pros:
        • No reallocations (so insert is always constant time)
          • Also means no memory usage spikes (where memory usage temporarily doubles as we allocate a new block and transfer the old one over)
        • Generations increase roughly uniformly
      • Cons:
        • Dynamic sizing is important for most use cases
    • Block allocation
      • Pros:
        • Constant inserts (like the constant-size array)
        • Spikes in memory due to reallocations are smaller
        • Iteration speeds are similar to original
      • Cons:
        • Elements aren’t fully contiguous, so iteration will always be some degree slower
        • More cache misses (scales inversely with block size—bigger blocks mitigate this!)
        • Adds a third indirection in lookup

Bloom filters

  • A probabilistic, super-duper memory efficient hash “set,” used primarily for short-circuiting otherwise expensive computations
  • Find results are only “definitely not” and “maybe”
  • Supports find and insert operations only—no actual data elements is stored (since, as the NSA informed us, metadata is not real data! 😉 ), so there’s such thing as a “get” op
    • This is especially useful for privacy purposes
  • How it works
    • On insert, take K (fast) hash functions, each hash function sets one bit in the Bloom filter’s bit field
    • On find, hash the object and see if each of the bits you get from your hash functions are set; if they are, the item in question was probably inserted in the past
    • Increasing the number of hash functions decreases your false positive probability (but increases the memory usage)

Navigation meshes

  • Representation of a search space (used for pathfinding) that reduces the number of nodes necessary to represent the traversable area
  • Grid representations suck: They don’t map well onto non-square shapes, and you typically use a lot of them
  • Triangle nav meshes improve upon this; using a mix of triangles and squares can improve it even further
  • Consider:Screen Shot 2017-11-02 at 4.06.26 PM.png
  • To actually traverse this, you would move from edge to edge to get between nodes, and when you reach your destination node, you could simply beeline to your goal (since you know it will be free of obstacles)
  • You can use Recast to create nav meshes, and A* or the like to search them

Hash pointers

  • A combination of a pointer and a hash of the thing being pointed to
  • Allows you to verify that the data hasn’t been changed since you got the pointer to it
  • Most common implementation is in Merkle trees in cryptocurrency blockchains
  • Tamper detection: suppose the data in a block has changed maliciously in a blockchain (a series of hash pointers + data). In order for the attacker to “get away with it,” they’ll have to change the data in Block X, plus the hash in Block Y that points to Block X. But, Block Z stored a hashpointer to Block X as well, and its hash included the now-altered hash in Block X, so the attacker has to also change the hash of Block Z (and so on up the chain).
    • Verification is O(N), for N number of blocks
  • Alternative: Merkle trees
    • Tree structure where only the leaves hold data, and all other nodes simply store 2 hash pointers to 2 children
    • Verification of the structure takes O(log(N)) time, but that also means it takes less tampering on the part of an attacker
Advertisements

Notes from Andre Alexandrescu’s Modern C++ Design

Here are my highlights from Andre Alexandrescu’s Modern C++ Design: Generic Programming and Design Patterns Applied.

  1. Policy-Based Class Design
    • Overview:

      Policy-based class design fosters assembling a class with complex behavoir out of many little classes (called policies), each of which takes care of only one behavioral or structural aspect.

    • The challenge in any design is that you’re forced to choose a solution from a combinatorial solution space.
      • Experienced designers are like experienced chess players: they can see many “moves” in the future and anticipate tradeoffs from a particular “correct” design choice.
    • A “do-it-all” interface will fail because of the growing gap between syntactically valid and semantically valid uses of the library.
    • Design-targeted libraries need to help user-crafted designs to enforce their own constraints, rather than enforcing predefined constraints.
    • Policy-based design combines the advantages of multiple inheritance and templates, while simultaneously avoiding their weaknesses.
      • Policy: a class interface (or class template interface) with inner type definitions, member functions, and member variables that emphasizes behavior rather than type.
        • Reminiscent of the Strategy design pattern
        • Unique advantages: static binding and type knowledge
      • Unlike a Java-style interface, a policy “interface” is loosely defined. They specify syntactic structures that should be valid rather than the exact functions that should be implemented.
        • As a consequence, different implementations of a policy may have different (or extended) behavior.
          • If you use a function not provided by all policy classes, your code will work just fine until you change concrete policies (as you would expect).
      • Host classes assemble policies together into a single complex unit. (E.g., WidgetManager uses the Creator policy to define how widgets get created (p. 9).)
      • By using template template parameters, you can avoid redundant (or easily inferred) parameters and give the host class access to the template. Thus, do this:
         // Libarary code
         template <template <class Created> class CreationPolicy>
         class WidgetManager : public CreationPolicy<Widget> {}
        
        
         // App code
         typedef WidgetManager<OpNewCreator> MyWidgetManager;
        

        (The Widget in the library class’s inheritance allows it to specify the type it works with—it is the Created class.)

      • Combining policy classes in a single host class:
         // Library code
         template <class T, template<class> class CheckingPolicy, template<class> class ThreadingModel>
         class SmartPtr;
        
        
         // App code
         typedef SmartPtr<Widget, NoChecking, SingleThreaded> WidgetPtr;
        
        • The most important thing to consider in decomposing a class into policies is that the policies should ideally be orthogonal (they shouldn’t interact).

Stupid Type Conversions in C++98

If you’re working on a C++98 project, you have my condolences.

There are a number of type conversions that newer versions of C++ make super easy, but which are not included in C++98.

Here’s my reference for these:

String to…

Int

You need the equivalent of stoi(). Here’s mine:

static int stoi( std::string s ) {
    int i;
    std::istringstream(s) >> i;
    return i;
}

Double

Incredibly, C++98 has atof(const char*). (You’ll just need to call .c_str() on your std::strings.)

Number (int or double) to String

You need the equivalent of std::to_string(). Here’s mine:

#include <sstream>

template<typename T>
std::string to_string(const T& value)
{
    std::ostringstream oss;
    oss << value;
    return oss.str();
}

C++ Pointers & References Cheat Sheet

My dirty little secret: I’ve spent too much time in fancy-pants languages like Java… and Python… and Ruby… and PHP… and Javascript… to remember what the & does all the time.

This is my cheat sheet. Hope it helps you too.

The Golden Rule

If & is used to declare a variable, it makes that variable a reference.

Otherwise, & is used to take the address of a variable.

Pointers

Initialization

int myInteger = 42;
int myIntPointer = &myInteger; // "address of"
cout << *myIntPointer; // prints 42

As a parameter

void foo( std::string * stringPtr ) {
     cout << *stringPtr; // prints "Aloha"
}
std::string s = "Aloha";
foo( &s ); // pass Aloha's address to the pointer

References

Pass-by-value

void foo( int x ) {
     x = 42; // x is passed by value, so no change occurs outside this function
}
int x = 0;
foo(x);
// x == 0

Pass-by-reference

void fooMutate( int &x ) {
    x = 42; // passed by reference, so x changes outside this scope
}

int x = 0;
fooMutate(x);
// x == 42

Declaring a reference (alias)

int x = 0;
int y = &x; // y is now an alias of x
y = 42;
// x == 42