Aug 22, 2010 12:48
A while back I experimented some more with MurmurHash and extended my test suite to cover 32/64/128-bit hashes and in the process put together MurmurHash3, which was yet again faster by a considerable margin than MurmurHash2.
I'm rather glad I didn't publish it, as a guy in Russia found a rather odd flaw with MurmurHash2 that also appears in the unpublished MurmurHash3 -
Strings that contain nothing but repetitions of the same substring ("wordwordwordwordwordword", "hashinghashinghashinghashinghashinghashinghashing", etc) are far more likely to collide than would be expected due to chance, and increased repetitions increase the collision rate. I haven't pinned down the mechanism for this yet, but it seems like the repetition causes the number of possible states for the intermediate hash value to shrink.
So, fixing that is a reasonable goal for MurmurHash3 - I've ditched the earlier version and am doing a rewrite.
The mix function I'm looking at now is of the form
uint32_t c1 = 0x47581a31;
uint32_t c2 = 0x9ff73e3b;
int l = -(len/4);
for(const uint32_t * d2 = (const uint32_t *)key - l; l; l++)
{
uint32_t k = d2[l];
k *= c1;
k = _rotl(k,13);
h = h*3+0x38472123;
k *= c2;
c1 = c1*9+0x3a5beab4;
c2 = c2*5+0x412c771e;
h ^= k;
}
The constants are just random and have not been tested, and the weird syntax is there to coerce the compiler into generating the shortest possible loop. The above loop compiles down to 10 instructions -
mov ecx,dword ptr [edx+eax*4]
imul ecx,edi
rol ecx,0Dh
imul ecx,ebx
lea esi,[esi+esi*2+38472123h]
xor esi,ecx
add eax,1
lea edi,[edi+edi*8+3A5BEAB4h]
lea ebx,[ebx+ebx*4+412C771Eh]
jne MurmurHash3_32+40h (61040h)
Note that the multiply-add steps are accomplished in a single cycle via a creative use of the "lea" (load effective address) instruction thanks to the compiler's optimizations. This runs about 22% faster than MurmurHash2 - presumably due to fewer data dependencies (no copy-shift-xor step).
This means that each 32-bit word of the key is mixed using a different variant of the basic multiply-rotate-multiply operation, which will be tricky to test. I'll need to find constants that ensure that at least the first few hundred generated variants don't have any pathologically bad behaviors, otherwise the hash would have "holes" where key bits don't affect the output in a random fashion.
It's looking quite promising so far - everything passes my current tests, just need to bang on the constants and adapt it to 64- and 128-bit versions and it'll be publishable.
-tanjent