stdin: Re: Performance

From: Dave Hudson <dave_at_nospam.org>
Date: Thu Jan 19 1995 - 12:22:51 PST

Hi,

Greg Hudson wrote:
>
> The problem is when you're hashing a lot of keys which happen to be
> divisible by N, where N is a smallish number. If you're using a prime
> modulus, this doesn't cause a problem, but if you're using a power of
> 2 modulus and N is, say, 4, then you're only going to wind up using
> 25% of your hash buckets.

Hmm, this is something I've never looked at before, but having just
quickly looked at a couple of references I think you've explained why I saw
no real advantage from what I did with the hashing.

> One solution is to use a multiplicative hashing function, as described
> by Knuth on pp. 508-512 of volume 3 of _The Art of Computer
> Programming_.

The immediate problem I see is that with such a large multiplier we still
end up executing a really expensive opcode (almost identical to the div),
although I guess it may be faster for CPUs with no divide instruction.

One thing that does spring to mind though is that if I analyse the keys
we're using and ensure that they're sufficiently randomised for our purposes
then we'll still get a good spread. I think I'll look at the current values
ASAP. I know the current code makes an attempt at this with the shift and
xor op, but perhaps the shift's no large enough say.

Regards,
Dave
Received on Thu Jan 19 14:01:54 1995

This archive was generated by hypermail 2.1.8 : Thu Sep 22 2005 - 15:12:17 PDT