Please confirm it is okay now -- MORE INFO

Ruslan Zasukhin sunshine at public.kherson.ua
Sun Jul 16 13:46:38 CDT 2006


On 7/15/06 9:53 PM, "Stan Busk" <maxprog at mac.com> wrote:

> 15 seconds is with my G5 Quad 2.5 - It is production system so it is
> with Valentina v2.3

> 12 seconds with my G5 Dual 2.0 - This is my testing system with
> Valentina 2.4b16
> 
> I use the same DB I sent to you. I also tried to delete index, no
> changed.

Hi Stan,

This is what I have got so far.

-----------
1) I have made some optimizations in VarChar field to do faster direct
iteration of column when we do LIKE search.

Now it takes not 0.15 but 0.10 of sec for one of your query.


-----------
2) I have found that your database is very not compact.
    I have made db.Clone() using Vstudio,
    and from 263MB blb file go to 136MB.

Smaller size of db also can speedup things!
So I recommend you also do db.Clone() on your side.


-----------
3) I have made Export/Import of field Article_Newsgroup into new v1
database. To see how much take v1

    v1 takes on this data and query   0.02 second.
    v2 now takes                      0.10 sec on default.
    v2 with disabled ICU collator     0.05 sec on default.


So we see that Collator slow down in 2 times...
Collator is enabled always. Because in general case we want that LIKE work
for any language...

But we can see that IN YOUR case you keep in this column names of
newsgroups. So this is only English, right ?

May be we need way somehow disable usage of collator for such fields..
But I do not have idea how todo this in API of Valentina.


-----------------
4) I will think MORE about LIKE of course.


-----------
5) But now the most important point...

Stan, tell me please what role plays field Article_Newsgroup  in table
Articles ?

    As Far as I see, you use it to link ONE article to MANY newsgroups.
    Am I right ?

This is not good idea !!!!!!
And there is where you have huge potential to speedup things.

You use format here:
    group1, group2, group3


*** NOTE btw *** in table Groups field Name is not unique.
                 I think it should be. Right?


So, you have table Groups, and you have table Articles.
You want establish relation Many to Many. Am I right ?

You use this LISTS of COMA separated groups. -- BAD !!!

    * you get big overhead on disk.
    * you get much slower things
    * you need use LIKE and do text searches.
    * BTW, I do not understand why you set Article_Newsgroup = IndexByWords


The correct design is described in any DB book:

    * you need to have third table GroupArticles with 2 fields
        GroupID, ArticleID as ULONG both.
    
    * in v2 it is possible to use BinaryLink here also.


>> Because articles has around 10 different states.
>> 
>>    U = Unread         (old 0)
>>    R = Read           (old 1)
>>    D = Deleted        (old 2)
>>    A = Archived       (old 3)
>>    S = Sent           (old 4)
>>    Q = Queued
>>    P = Purge
>>    L = Log
>>    F = Followed-up
>> 
>>    For example: RA = Read and archived
>>                 RD = Read and deleted
>> 
>> Status is 32 bytes so it can contain much more stuff. I choose this
>> design because it worked really fast with v1.

6) Also I still do not like how you have made STATUS of article.
    I'd make 10 Boolean fields instead.

    10 Boolean fields will use 10 bits only.
         
    Now you have VarChar[32] -- even empty takes 8 bytes.  32 bits.
        I have check. Most your records are as "U" or "R"
        Only few was "UR".

    And the most important is that you do SEARCHES LIKE now,
    but you will be able do searches on BOOLEAN fields, which are
    incredible fast.

    and Important is that as I see in SINGLE query you usually check
    only 1-4 states. But not all 10.


------------------------
7) I will check our tests now and will make new build of V4RB.
    So you can check with my latest optimizations + do db.Clone...

I think the most easy for you now convert STATUS field to Boolean fields.
Such function can be made in 30 minutes. Then do not changing nothing more
in app code, you can test new structure of db from viSQL doing your
"thread query".    

    Create 10 fields

    tbl.FirstRecord

    Loop 
        // switch on 10 states to set 10 new fields.
        case state 
            U
                fldU.value = true
            R
                fldR.value = true


        tbl.UpdateRecord
        tbl.NextRecord

    next


If you will see speedup in 20-30% then there is sense do this improvement of
db design...

Main point, 
    you need reduce number of LIKES as much as possible.
    me need try optimize LIKE more as much as possible.

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the Valentina mailing list