Some UTF-8 observations

Ruslan Zasukhin sunshine at public.kherson.ua
Sat Nov 11 09:57:28 CST 2006


On 11/11/06 5:57 AM, "Kem Tekinay" <ktekinay at mactechnologies.com> wrote:

Hi Kem,

> I modified the Valentina example that I wrote for my Data-On-Demand ListBox to
> use UTF-8, and immediately discovered two things:
> 
> First, the size of the database is not appreciably smaller. After importing
> over 5 million records of 7 fields (3 indexed strings, 4 non-indexed doubles),
> the size of the files was reduced to just under 700 MB from 766 MB.

Okay, lets think. Take next points:

1) if you have STRING[N]...
    size here should become 2 times less for TABLE column

2) if you have VARCHAR[N]...
    size here should become ALMOST 2 times less for TABLE column

3) if you have TEXT[N]...
    size here will become smaller if you have text BIGGER then segment size.
    if you have segment size = 256, but strings are 200 bytes for UTF16,
    then 100 bytes of UTF8 still will eat one segment 256 bytes.


Valentina Studio -- FEATURE to help here:

    in schema Editor - switch to TREE VIEW
    here Vstudio can show in table size of each table/field on disk.
    so you can see what fields become smaller, what not.

Also during this experiments its good to keep file in 4 files,
To see if .ind become smaller or .dat

----------------------
> Second, searches do not work correctly when using UTF-8. I got no or strange
> results when I used StartsWith searches on the indexed strings. For example,
> the following statement in Studio should yield one record:
> 
> select Zip_Codes.Zip_Code, city from Zip_Codes where city like 'mount kisc%'

It returns zero ?
 
> This works fine on the UTF16 version.

I see, although strange. I did fix this kind of search. And LIKE/REGEX also.


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the Valentina-beta mailing list