About Beatrix SEARCH
Ruslan Zasukhin
ruslan_zasukhin at valentina-db.com
Sun Dec 1 03:02:27 CST 2013
On 11/30/13, 8:22 AM, "Beatrix Willius" <bwillius at gmx.de> wrote:
Hi Beatrix,
Hi Jon,
> as you may remember I store mails in my Valentina databases. Searching for
> values in From, To or dates is relatively fast. But searching for content in a
> mail is dog slow. My customers have databases with >100.000 mails. This needs
> to be fast. Just go into your own mail client and try to search for a text.
> When I try to do this with my 140.000 mails database this takes over a minute.
> This is the single task that I want to optimize:
>
> WHERE Message.MessageBody REGEX '(?i)something'
This is NOT INDEXED search
by a single TEXT field 'Message.MessageBody '
Case-insensetive
This search can do more than left(), because it is able to find substr.
Since this is not-indexed search, then of course you iterate each time the
whole TEXT (blob) field, which is not fast for iteration, and you have
linear task anyway.
Q1: Beatrix, do you really want substr() search in each word? May be
START WITH ... Search, i.e. Left() will be okay for most cases?
if yes, then you can use left() -- there is FAQ about this it seems.
Q2: you have right now IndexByWords, but it is case-sensitive. Yes?
May be index should be case-insensetive?
===========
And about effective substr() ... Hmm.
I will try to find time to read about fast algs for this task ...
May be some hashes ... May be some rotation of chars ...
For example 'abcde' go to index many times as:
abcde
bcde
cde
de
YES this will make operation of indexing and INSERTs slower.
But you always have some losts to get some win in other place.
==============
Beatrix, please add advice from Kirill about another engine, to your own
issue in Mantis about Lucena
--
Best regards,
Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc
Valentina - Joining Worlds of Information
http://www.paradigmasoft.com
[I feel the need: the need for speed]
More information about the Valentina
mailing list