About Beatrix SEARCH

Ruslan Zasukhin ruslan_zasukhin at valentina-db.com
Sun Dec 1 03:02:27 CST 2013


On 11/30/13, 8:22 AM, "Beatrix Willius" <bwillius at gmx.de> wrote:

Hi Beatrix,
Hi Jon,

> as you may remember I store mails in my Valentina databases. Searching for
> values in From, To or dates is relatively fast. But searching for content in a
> mail is dog slow. My customers have databases with >100.000 mails. This needs
> to be fast. Just go into your own mail client and try to search for a text.
> When I try to do this with my 140.000 mails database this takes over a minute.
> This is the single task that I want to optimize:
> 
> WHERE Message.MessageBody REGEX '(?i)something'

This is NOT INDEXED search
    by a single TEXT field 'Message.MessageBody '
    Case-insensetive

This search can do more than left(), because it is able to find substr.


Since this is not-indexed search, then of course you iterate each time the
whole TEXT (blob) field, which is not fast for iteration, and you have
linear task anyway.


Q1: Beatrix, do you really want substr() search in each word? May be
   START WITH ... Search, i.e. Left() will be okay for most cases?
    if yes, then you can use left() -- there is FAQ about this it seems.


Q2: you have right now IndexByWords, but it is case-sensitive. Yes?
May be index should be case-insensetive?


===========
And about effective substr() ... Hmm.

I will try to find time to read about fast algs for this task ...
May be some hashes ... May be some rotation of chars ...

For example  'abcde'   go to index many times as:

          abcde
          bcde
          cde
          de

YES this will make operation of indexing and INSERTs slower.
But you always have some losts to get some win in other place.





==============
Beatrix, please add advice from Kirill about another engine, to your own
issue in Mantis about Lucena


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the Valentina mailing list