Optimal Performance String Searching with Hashes Re: VChar vs
 VText
    Ed Kleban 
    Ed at Kleban.com
       
    Wed Nov 30 11:12:21 CST 2005
    
    
  
On 11/30/05 10:47 AM, "Ruslan Zasukhin" <sunshine at public.kherson.ua> wrote:
> On 11/30/05 6:38 PM, "Ed Kleban" <Ed at Kleban.com> wrote:
>   
>> There is one likely hitch however.  This will only work for certain
>> collation attributes unless you always cast a string into some normalized
>> form before performing the hash.   So in the case of kStrength, kPrimary for
>> example, you'd have to cast the string to a version with no accents and all
>> lowercase before calculating the hash.  But yeah, I guess that's readily
>> doable as well.
> 
> :-)
> 
> It needs check, may be IBM guys have hash methods.
> Something they have exactly.
> 
Be very careful.  You don't want or need a hash with fantastic polynomial
distribution characteristics like CRC, you don't need a secure hash like
MD5.  You simply need a very very "good enough" fast hash, like "rotate by 5
bits and xor".   I'd look to see if the IBM guys have a normailzation method
for converting a string to a standard form and then use your own hand-coded
hash algorithm on that.  I'd be very wary of using another "hash" algorithm
without understanding exactly what it's doing.  It's the primary bottleneck
in making this run fast.
    
    
More information about the Valentina
mailing list