odd indexed string behaviour?

Ruslan Zasukhin ruslan_zasukhin at valentina-db.com
Tue Aug 28 09:10:56 CDT 2012


On 8/28/12 4:09 PM, "George Parkinson" <george at microtherapy.ca> wrote:

Hi George, 

> hi ruslan,
> the problem is two-fold...storage and indexing
> 
> 1 - storage: for this project,  i ship out read-only survey results.
> clients never start with an empty db...they consume what is shipped.
> i'm going to have major push back from clients (000's) when i replace what i
> ship now with something approaching twice the size.

1) What was BEFORE Valentina ?

2) what size was in the past?

3) it is not x2 factor, because yet exists numeric data in db.
Also Valentina has often more compact format e.g. For indexes,
So all together this can be less than x2 I guess.


4) you worry about size to download?  Then use ZIP.
    Valentina db files can be compressed very good, and that UTF16 zeros in
many places will be compressed.

If you worry about space on client computer ..
Your dbs is going to be many GBs ?

 
> 2 - indexing: the first 256 symbols of each record is indeed 256 when using
> utf8, but becomes 128 when using utf16.
> i have a field of 188 characters (ascii 34-127) that is essentially the
> index...but with utf16 indexing won't work.

Wow ... KEY value  188 chars ??

As for me this can be bad db design.

And if before Valentina you did use e.g. MySQL or I think other RDBMS,
Then you should know that if you choose such huge Primary KEY,
Then you get huge penalty also. In particular in the EACH index of such
table, this KEY will be stored. 10 indexed fields, 10 times the same KEY of
size 188 ... Huge penalty.

 
> splitting into two fields, with each indexed won't work because the 128 symbol
> limit is for the record.

KEY should be just an ULONG in ideal.

May be you mean that this field has UNIQUE index,
To ensure that there is no duplicates ?

> dropping the index is not an option because of the volume of data (hundreds of
> thousands of records a month with an average retention of 4 yrs).
> 
> this is why i'm looking for utf8 in valentina.

So lets assume you have this field and index on it ...

What exactly happens next?

*  user types up to 188 chars in some dialog to find a record?

* or what kinds of searches you do with this KEY?
  how many records are found in result usually ?


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the Valentina mailing list