PLEX86  x86- Virtual Machine (VM) Program
 CVS  |  Mailing List  |  Download  |  Newsgroups

String hashing was: Thou shalt have no other gods before the ANSI C standard


Your Ad Here

Your Ad Here

... snip ...

I just downloaded your test program, and I see various difficulties. First, it is sadly (and unnecessarily) lacking in portability, using ints in many places where an unsigned long is needed. (There is no guarantee that an int has 32 bits).

Second, it tests the wrong things. It creates an arbitrary binary byte array of length 256. This is an unusually long string for any hash table use. In addition, most real strings will consist of printable characters, not arbitrary bytes. If the string length is lowered to 5 or 10 the efficacy of the times31 or times33 hash shows up, and the SuperFastHash is relegated to a much lower speed. The SFH is probably well suited for binary keys of long length.

Spread of the hashes over the 32 bit spectrum is not important. The important thing is the spread over the current size of the hashtable, which had better not be a 32 bit sized item. This is normally attained by either masking or modulo operations. As ever, the optimum hash function depends on the particular data to be hashed. It simply cannot be optimized on its own.

As soon as I find a round tuit I expect to try the various functions out on real data with a real hash table. My hashlib will supply the table (go to my site for a copy). I expect to use a text version of N868 plus 1 that I have here as the data sample, it is roughly 1 meg of text. I can store either words, defined as things that end with white space, or lines, being things that end withn. The hashlib instrumentation will tell a lot about the efficacy of the hash functions ON THAT DATA.

I will have to think about how to eliminate the biases from file reading. Virtual memory thrashing will show up as a sharp change in time versus size measurements.

String hashing was: Thou shalt have no other gods before the ANSI C standard 1548
CBFalconer This is not comp.lang.c. The dead ANSI committee dogma has no audience here. (Or well it does -- from some trolling cross posters, but I don't think your blind adherence to some non-existent...
String hashing was: Thou shalt have no other gods before the ANSI C standard 1550
Tim Peters ... I wouldn't know, since I haven't used either of those. I don't claim that it's not true. I've been on the receiving end of hash bug reports against...

-- Available for consulting-temporary embedded and systems.



Your Ad Here

List | Previous | Next

String hashing was: Thou shalt have no other gods before the ANSI C standard 1548

Alt Folklore Computers from Newsgroups

The #1 Usenet Provider on the Internet

String hashing was: Thou shalt have no other gods before the ANSI C standard 1546