When you say “hash function” in a room full of Python developers, people tend to think of the classics you get in hashlib: MD5, SHA-1, SHA-224, SHA-256, SHA-384, SHA-512, SHA-OHGODPLEASESTOP, and the like — cryptographic hash functions intended for cryptographic uses.
There’s another world out there, though: non-cryptographic hashes. Sometimes you just need to figure out if you’ve already seen a string or structure. Sometimes you need a basic checksum. Sometimes you need a hash that’s just fast and can fit into a 32 bit integer.
I’ll run through the state of the art in the world of non-cryptographic hashing — what your best options are, how they compare in terms of CPU and memory usage, and how they work. Who takes it? Whose mixing function reigns supreme?
Let’s find out.
* I’m saying “four”, but realistically I’m going to cover as many modern options as I can fit into the time available. Let’s say “four plus two or minus one” in reality.
Adam is a PHP Agenteer (it's totally a word) at New Relic who spends most of his time swearing at C compilers and the PHP API, usually not in that order. He notes that the quality of his swearing has gone downhill somewhat since his glory days of swearing at Python metaclasses.
In between said bouts of invective, Adam works on various open source projects, plays cricket, kayaks, and continues in his ongoing quest to find the best beer in the world. (Well, OK, it’s totally Westvleteren 12, but it’s a good excuse to drink more beer.)