Optimal Performance String Searching with Hashes Re: VChar vs VText

Ed Kleban Ed at Kleban.com
Wed Nov 30 11:12:21 CST 2005




On 11/30/05 10:47 AM, "Ruslan Zasukhin" <sunshine at public.kherson.ua> wrote:

> On 11/30/05 6:38 PM, "Ed Kleban" <Ed at Kleban.com> wrote:
>   
>> There is one likely hitch however.  This will only work for certain
>> collation attributes unless you always cast a string into some normalized
>> form before performing the hash.   So in the case of kStrength, kPrimary for
>> example, you'd have to cast the string to a version with no accents and all
>> lowercase before calculating the hash.  But yeah, I guess that's readily
>> doable as well.
> 
> :-)
> 
> It needs check, may be IBM guys have hash methods.
> Something they have exactly.
> 

Be very careful.  You don't want or need a hash with fantastic polynomial
distribution characteristics like CRC, you don't need a secure hash like
MD5.  You simply need a very very "good enough" fast hash, like "rotate by 5
bits and xor".   I'd look to see if the IBM guys have a normailzation method
for converting a string to a standard form and then use your own hand-coded
hash algorithm on that.  I'd be very wary of using another "hash" algorithm
without understanding exactly what it's doing.  It's the primary bottleneck
in making this run fast.





More information about the Valentina mailing list