UTF8 vs UTF16

Ruslan Zasukhin sunshine at public.kherson.ua
Wed Nov 1 21:46:10 CST 2006


On 11/1/06 9:31 PM, "jda" <jda at his.com> wrote:

>> On 11/1/06 7:23 PM, "jda" <jda at his.com> wrote:
>> 
>>>  Hi Ruslan,
>>> 
>>>  OK, everything is up and running. The savings I see with UTF8 are
>>>  surprisingly small. I have one db with 4500 records (lots and lots of
>>>  vtext). The size is as follows:
>>> 
>>>  UTF-8 = 19.1 MB (20,041,728 bytes)
>>> 
>>> UTF-16 = 21 MB (22,056,960 bytes)
>>> 
> 
> Some followup.
> 
> I looked with VStudio at one VText field (authors):
> 
> UTF-16 -> 1436.25 KB
> 
> UTF-8 -> 1134.51 KB
> 
> So only about a 26% compression (I have only ASCII in this field, so
> all UTF8 characters take 1 byte). Should be closer to 50%, I think.
> 
> I can compress and upload the databases if you want to inspect them.

No, because this is BLOB, which allocate disk by SEGMENT SIZE

In that example, segment size I have made 128
    but strings are only 40 bytes for UTf16
        and 20 bytes for UTf8.

So how many segments are used per record?
Right. In both cases ONE segment.

So in that example .blb file have 100% the same size for UTF and UTf16.

-----
To get more WIN you can try set small segment size


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the Valentina mailing list