Database Large VarChar Search

Ruslan Zasukhin sunshine at public.kherson.ua
Sun Apr 4 09:58:07 CDT 2004


On 4/4/04 6:47 AM, "Andy Dent" <dent at oofile.com.au> wrote:

Hi Andy,

Don't you mind if I will CC answer to Valentina list and RB list.
Because you have ask good questions.
And I think many people will be interested in answers on such questions.

> G'day Ruslan
> 
>> Valentina is able to do LIKE search and REGEX search in the time
>> practically equal to INDEXED search!!!
> 
> Are you seriously saying that if I stored say 20,000 documents with
> an indexed Title field and a Memo field with 2MB of text on each that
> searching for an arbitrary regex in that Memo would be a similar
> speed to an indexed search for a title?
> 
> Either you got a little carried away with your claim or you have
> better searching than any other database engine that I've ever heard
> about.
> 
> I can believe that Valentina is very fast up to a certain size level,
> certainly much better than other engines commonly used in RB. I have
> a lot of trouble believing your algorithms operate at a different
> order where you can make such a claim.
> 
> I know about text indexing technology (for static databases) such as
> discussed in Managing Gigabytes (http://www.cs.mu.oz.au/mg/). I
> wasn't aware it was even theoretically possible to optimise regex.
> 
> Note: I'm not contemplating OOFILE as a competitor against Valentina
> - Faircom's pricing puts that out of the question with the developer
> version of c-tree Plus retailing at US$895. It might, however, be
> useful in future to put Valentina behind the OOFILE interface as
> another database backend and thus enable the GUI and Report-Writer
> classes to be used transparently with Valentina.

1) btw, I never was able even test c-tree plus because this guys do not give
any demo for free. Buy since they use

2) I do not see problems, if some day you will use in OO Files Valentina as
backend. This can be win-win situation. You have own audience, as I
understand, mainly CodeWarrior C++ PowerPlant developers.
Valentina C++ SDK have own OO wrapper, which I think is unique from many
points of view, but OO FILE have power of integration with your GUI classes.
So why not. 

Actually we develop Valentina 2.0 kernel in such way, that it will be easy
make several different OO wrappers around main kernel, for example I see
ability to make true ODMG standard wrapper or e.g. OO wrapper as was made
around mySQL C API (it is oriented on STL STD classes).

Valentina is very flexible (like any good woman).


3) ANSWER: I did not invent new REGEX search, and not going to do that!
we use standard REGEX libs. In particular in Valentina 2.0 we use RegEx
features of IBM ICU library.

4) ANSWER: I already have underline many times, repeat once again:
        the main complexity of DBMS development and DBMS usage
        is that exists hundreds and thousands of cases/conditions.
        in one case better work one algorithm/format
        in other case better work other algorithm.

This is why there is no trust when somebody say:
    Hey, in 2-5-6 months I will make good db engine.

NOT TRUE. Very soon, he will start to see that thousands of cases.
What step will be next?
    - Drop work    
    - make something simple and very limited
    - assume a lots of limitation, and again made simple case only.

5) So taking into account point 4) please note that exists MANY quite
different text searches:

a) fixed string, e.g.    String[40]
B) VarChar string, e.g.  VarChar[504]
C) TEXT (BLOB)  

In your letter you ask about case c) and your mind is targeted on
FULL TEXT indexing of big BLOB fields.

My "claims" was about cases a) and b) in the first turn.
Actually this is not my claims, this is results of many Valentina
developers. 

If you want to know how Valentina is able do non-indexed REGEX search on
String or VarChar field with speed close to INDEXED search on the same
field, answer is -> thanks to special format storage of Tables.

Indexed search also have few quite different cases.
    - exact search, which find only one record
    - range search, which find 0..N records.
        -- range search small, when e.g. 100 from million records
        -- range search wide, finds 800,000 from million.

So comparing to which one from above?  :-)

In fact, exact indexed search will be much faster of REGEX even in
Valentina. But Wide range search can be comparable to REGEX.

Again, hundreds cases! Thousands of combinations!!!

In Valentina we try to keep in mind all them, and for each provide the best
format and algorithm. Even if this algorithm will give win in few percent
only, we will spend time to develop it. We are not lazy.  :-))


-- 
Best regards,
Ruslan Zasukhin      [ I feel the need...the need for speed ]
-------------------------------------------------------------
e-mail: ruslan at paradigmasoft.com
web: http://www.paradigmasoft.com

To subscribe to the Valentina mail list go to:
http://lists.macserve.net/mailman/listinfo/valentina
-------------------------------------------------------------



More information about the Valentina mailing list