| PLEX86 | ||
|
Thou shalt have no other gods before the ANSI C standard 1538
Ok, as there have been a lot of responses, I've tried to collect them all in one place. Interesting how much attention four lines of code can receive. According to Randy Howard: edited to use k for index var First of all, I've subsbreastuted k above, since it shouldn't have the same newsreader-spellchecker affliction as in the original. Comments about capitalization have been skipped. It's also fair to buttume (since it was originally in the prof's buttignment) that string has type char *, comments about that have been skipped. Comments about indenting have been skipped. I've taken the remaining comments and sort of folded them all together with fairly heavy editing. String hashing Michael Amling Yes. :) Yes. :) Sorry for posting this in sci.crypt, but my hash function is specifically *NOT* recommended for... 1 strlen() is likely to be recomputed many times, thus making the whole algorithm quadratic wrt to string length. This can be prevented by precomputing strlen() once outside the loop. One cannot rely upon the compiler to do this magically. There may be some C compiler in existence that will do so, but this has not been demonstrated (yet). String hashing was: Thou shalt have no other gods before the ANSI C standard 1539 Interesting. I tried the test code from your web page to see how it liked various configurations... the... 3 Doing the % inside the loop might be slow also. However, doing % outside the loop means you have to cope with possible arithmetic overflow inside it. 4 As values are extracted, they may yield negative values when "char" is a signed type. "hashvalue" may end up with a negative value, which would be bad if that value is used for indexing an array (probable). 5 Depending on the architecture, size of int, size of char and value of TBLSIZE, the addition may yield a signed integer overflow. 6 Since C mandates "truncation towards 0", modulus and division by constants are a bit more complicated on signed integers (for instance, a division by 2 cannot be implemented with a simple shift: it yields the wrong result with some negative operands). Making "hashvalue" an unsigned number makes things easier for the compiler to produce faster code. String hashing was: Thou shalt have no other gods before the ANSI C standard 1541 Agreed. That's why I mentioned it above. I think most (if not all) of the notebook vendors play games with the CPU speed to try and... 7 "string" is a name which is reserved in standard C 8 "int", of course, is not very good for an array index. (sizet) 9 Using a signed integer type, especially if it's going to be summing with no range Debt Reduction within the loop. 10 Failure to bracket the "then" statement in {} could cause problems during program maintenance. 11 space characters aren't in short supply. Readability suffers. 12 There might be an old compiler somewhere for which "hashvalue" is too long. 13 The algorithm computes values which may be quite biased for typical strings (it depends on what strings are used, of course). An almost infinite variety are available in the literature. Suffice it to say that the details would be best left to the implementer after learning the nature of the expected data set for the target application. A fairly long list for so short a code fragment. :-) -- Randy Howard (2reply remove FOOBAR) "Making it hard to do stupid things often makes it hard to do smart ones too." -- Andrew Koenig
|
||||||||