| PLEX86 | ||
String hashing was: Thou shalt have no other gods before the ANSI C standard 1546Tim Peters String hashing was: Thou shalt have no other gods before the ANSI C standard 1548 CBFalconer This is not comp.lang.c. The dead ANSI committee dogma has no audience here. (Or well it does -- from some trolling cross posters, but I... Uhh ... right, but both Bob Jenkins and my hash are two hash functions which very clearly outperform it. String hashing was: Thou shalt have no other gods before the ANSI C standard 1550 Tim Peters ... I wouldn't know, since I haven't used either of those. I don't claim that it's not true. I've been on the receiving end of hash bug... Oh great -- you know Bob Jenkin's hash has been used a lot too (I learned about it from somebody who encountered it from someone else who used it based on yet another recommondation, with real world projects strewn in between) and I've never heard of any such reported weaknesses (though maybe Bob Jenkins is getting lots of reports of bad collisions and is keeping it to himself.) Its easy to play the falsification game -- certainly nobody has reported anything intrinsically bad about my hash function (OTOH, I've only gotten about 4 emails about it :) )! But I am fairly confident of the low collision of my hash function. You know why? Because I've put it through statistical and bit-avalanche testing. That's how it was *created*. I didn't just pick those parameters out of thin air! Then I verified certain properties of my mixing functions -- i.e., that its built out of injections. Ok, better than random on a 32-bit output? Think about that for a second. What exactly is he saying? If all he's doing is reducing the 1-2^32 collision odds for certain kinds of data, I'm not sure what value that is. Does FNV actually beat the birthday paradox? Now that might be something worth looking at, buttuming the clbutt of inputs for that was non-trivial. My hash has a collision probability of exactly 0 for any two inputs that differ only in a single aligned 32 bits worth of data, do you think FNV is referring to a similar property? Thou shalt have no other gods before the ANSI C standard 1553 On Sat, 26 Feb 2005 02:35:33 GMT in alt.folklore.computers, Randy That fact is recognized in the established professions, where you have... But this ignores the fact that real hash tables only take a subset of the output bits. In that case we are just up against the pigeon hole principle, and there is no way to have such a property, using FNV or my hash or any other hash. My hash does as well as possible by having a "full avalanching" property so hacking off just b bits, never puts you in a position of having worse than a 1-2^b probability of collisions for any pair. Notice that for the FNV hash, they recommend some kind of "xor-folding" mechanism -- so they don't seem to buttert the same kind of property. String hashing was: Thou shalt have no other gods before the ANSI C standard 1549 First, int it supposed to be "fast" on a given architecture, so I can see why the performance interest would look there first. Wanting to use it, and not blow up on systems... As far as I can tell, my hashing function, from a statistical point of view, achieves the best possible properties (with the exception of crypto properties -- artificial construction of collisions *is* possible) because it was specifically constructed to have these properties. buttuming my analysis is correct, FNV is not superior to my hash in this respect -- because it *can't* be. If Bob Jenkin's analysis is similarly correct, then I believe that FNV cannot be superior to his either. Ok, I don't believe FNV has any serious claim to having better collision properties (which, of course, is the first requirement for good hashing performance) than either my hash or Bob Jenkin's hash, or Colin Plumb's One-at-a-time hash. I have buttumed, and I think conservatively (since I know for sure mine has them), that we all made good hash functions with essentially "best possible" collision properties. If I am allowed to make this buttumption, then in fact the "microseconds per hash" *IS* the only thing remaining that matters. You may be jaded because CRCs, checksums, rotations, this "31-33" hash, and various other ad hoc solutions have serious collision problems. That doesn't mean that creation of good hash functions (for non-adversarial environments) is somehow magical or exceedingly difficult to do. That the FNV or the Bob Jenkins hash functions took a very convoluted route to achieve good collision properties, is not evidence of the need to take such a route. Pearson, Zobrist, One-at-a-time and my hash function are evidence of the fact that you can take much simpler approaches to generate a hash function with good properties. Mine just happens to be the one built around a particular very high performance inner core that was then tuned to have the required properties. About the only thing remarkable about the FNV hash is that it seems to scale to 64, 128, etc bit hashes -- but without a claim of good crypto, the utility of this escapes me. String hashing was: Thou shalt have no other gods before the ANSI C standard snip ... I just downloaded your test program, and I see various difficulties. First, it is sadly (and unnecessarily) lacking in portability, using ints in many places where... -- Paul Hsieh
|
||||
String hashing was: Thou shalt have no other gods before the ANSI C standard Alt Folklore Computers from Newsgroups The #1 Usenet Provider on the Internet
String hashing was: Thou shalt have no other gods before the ANSI C standard |
||||