Optimal Performance String Searching with Hashes Re: VChar vs VText

Ruslan Zasukhin sunshine at public.kherson.ua
Mon Nov 28 14:59:26 CST 2005


On 11/28/05 4:24 AM, "Ed Kleban" <Ed at Kleban.com> wrote:

>> But you cannot do range searches using hashing, right ?
> 
> Yes, that is correct.
> 
> You mentioned this also the last time I mentioned using Hashes, and both
> time have essentially implied that there is some drawback to not having an
> ability to do > or >= or < or <= searches.  I suppose that for some
> applications that may be true, but it is virtually never a drawback for my
> applications.  In fact I've never even considered wanting to do such a
> thing.
> 
> Question: What applications do you have in mind that range searches on
> strings would be useful for?

Strings have alphabetical order. So we must not prohibit to developers usage
of range searches. 

But you are right. IT is good idea to add one more option to choose:
    ala-hashing kind of index.

You can add this as request into Mantis


> For 99% of my applications I want to be able to:
> 
> 1) See if I already have a given string (or object, or structure, or
> whatever) in my table.
> 
> 2) If I do, then find that item, or lack thereof, as fast as possible.
> 
> 3) And then typically add the item into the table if it is not already
> there.
> 
> I therefore am doing this on a column of say strings for example, which I
> always know will be unique.
> 
> For a table of bytes, shorts, or longs, simply performing a binary search by
> using a FindSingle on an indexed unique column in Valentina is going to be
> pretty darn fast.
> 
> For a table of strings however, it's a whole 'nuther story.  First you have
> the overhead for comparing strings which is much, much slower than comparing
> integers,  On top of that as of V2 we now have the reality that strings are
> stored in tables in Unicode and are therefore twice the size, -- but which
> in reality does not imply twice the time since most matches will fail on a
> comparison of the first few characters.   But the really scary part that I
> have no clue about is that there are so many parameters you can choose from
> for the CollationAttribute, plus the fact that you're calling into some
> foreign package to perform those comparisons.  I have no clue what extra
> overhead burden there is for making these comparisons, but I'm presuming the
> answer is "at least a little bit".

Well, we use IBM library. I believe that IBM guys have did their job
perfectly.

And we have no other choice. You cannot have OWN string compare method for
each natural language.


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the Valentina mailing list