PLEX86  x86- Virtual Machine (VM) Program
 CVS  |  Mailing List  |  Download  |  Newsgroups

String hashing was: Thou shalt have no other gods before the ANSI C standard 1551


Your Ad Here

Your Ad Here

Tim Peters

But if you've never seen the Bob Jenkin's hash before, or the other functions he examined, then you've still only seen half the problem. I.e., you know your critieria and have managed to find a single solution, but you haven't considered all the possibly solutions.

It would be nice to know the difference before I comment on it more fully.

"x2",

Thou shalt have no other gods before the ANSI C standard 1555
not to mention plain text in various languages other than English. And another that uses the lowercase o for the number 0, and one...

This specific property can be achieved using my hash (or any other good hash) in the following way:

smallTableHash(data,n) = SFH(data,n-1)^datan-1;

Thou shalt have no other gods before the ANSI C standard 1552
Oh, a useless game. I am jumping in, I cannot resist. -- "I" is a poor name for a variable. Especially if the program also (that's not strictly necessary, but doing otherwise makes little...

A little analysis shows that some small amount of work can also be put into making something like that the last 14 bits of data output to be a basically exactly 1-1 with the last 14 bits of the input data, but setting the initial value of the hash accumulator to an appropriate constant in conjunction with the trick above.

Another way, if you are willing to give up on the avalanching properties is to slightly adjust the case 3: remainder case (to put the bottom char into the bottom output) and drop all the "avalanching" steps at the end. This will also make it faster (but it will have worse collisions in the general case.)

Oh its worse than that I'm afraid. For the prime 16777619 and offset 2166136261U, here's an interesting sequence:

Thou shalt have no other gods before the ANSI C standard 1554
I think you missed some of the bigger fish, as other responses should have made clear. It was an buttignment to learn about hashing, which is used in performance critical code quite...

These all come out to 0, BTW. This is why they say you need to "xor-fold" btw. (I *searched* just for the first byte, then deduced the obvious second byte.) So long as the hash is seeded with a non-zero value, you can't do this to SFH will less than a 4-byte initial sequence (that has to be deliberately scanned for each given seed). Since the seed hash is set to the length in the given implementation, you can't even just use a fixed header.

You're in luck! Its all 3 of those things! See:

Of course its *properties* are not portable to 64 bits -- you need to truncate the accumulator down to 32 bits on the right shifts to ensure this.

That seems odd to me -- you are saying the speed of some Python bytecode kernels is limited by hash speed? If so, you should probably either try my hash function immediately and-or solicit my help to see how this can be improved (I am actually a big fan of the Python language.)

Well, ok, so objectively speaking, after looking at say Bob Jenkin's article on good hash functions, would you say you seperated the wheat from the chaff? Or did you get partially lucky in picking just one function that happens to have some set of properties you were looking for?

How are the constants arrived at? Looking at the ones listed on the "FNV" page, the multipliers apparently are prime numbers with a specific binary form. I can't figure out the offsets for the life of me. I can sort of see how the chosen multipliers seem to work, but I don't see how I am supposed to actually *measure* how good they are, since they clearly gave up on having full avalanching from the very get go (the bottom 7 bits can never be correlated with the top most bit of any input byte, and at least with the 16777619 multiplier, bits 16 through 23 are always at least one interation behind the input byte).

My loop has lots of content -- but its mostly just a bunch of unrolling and end tweaks. There's no mystery about how I created the function, and you can guess exactly how all the constants were arrived at -- I just literally searched for them in two steps. The inner loop constants searched for creating the fastest and pervasive avanching as possible (I could only get it to fully avalanche the first n-127 bits) then final end code just mixes enough to avalanche the final 127 bits. Compare my code with the "One-at-a-time" hash and you will see that it is clearly just an extension of it.

Well, even a bad compiler will do that for me. :) I do that to show my "units". I had nasty physics teachers that would dock me marks in public school if I forgot my units.

Its not complex to implement. Its complex to *understand* it. Alternatively you can go through life believing that things like the 31-33 hash function are actually good because it seemed to work well the few times you used it.

Well, my function is only meant for 32 bits. You could probably get away with something like:

SFH(x) : SFH("somefixedheader-32bits" + x)

for 64 bits, but then it isn't faster than the 64-bit version of FNV. Hashing for the purpose of hashtable entries, however, are not going to need more than 32 bits. So I don't quite see the need for Python to populate a full long with good hash bits, when long is 64 bits.

Thou shalt have no other gods before the ANSI C standard 1557
Is this better 1 hoist: proc(n); 0000 HOIST: C07C 0000 .entry 52 52 F7 0006 cvtlw r2,r2 56 52 32 0009 cvtwl r2,r6 52 56 D0 000C movl r6,r2 5E...

-- Paul Hsieh



Your Ad Here

List | Previous | Next

Thou shalt have no other gods before the ANSI C standard 1552

Alt Folklore Computers from Newsgroups

The #1 Usenet Provider on the Internet

String hashing was: Thou shalt have no other gods before the ANSI C standard 1550