V4RB, Jon, project /// More explanation.
jda
jda at his.com
Wed Sep 15 14:19:11 CDT 2004
> say I define some field as UTF8 10 chars maxlenght
> >> I store some 10 chars string with no double bytes chars ("abcdefghij")
>>> the Vale kernel converts all in UTF16
>>>
>>> result: I loose half my data
>>>
>>> I got it well?
>>>
>>
>> No -- at least that's not how I understand it.
>
>Yes, Jon, you loose.
>
>On disk is reserved 10 bytes for Vstring field.
>So if you put 10 chars each 2 bytes then this is 20 bytes.
>On disk can be stored only 10 bytes.
>
>
>If, Erne, you will use VarChar(504), then all your 20 bytes will be written
>to disk. No problems.
>
OK, I'm stupid, but I just don't get your explanation.
He defined the field as storageEncoding UTF-8, and set max length to
10. He has 10 chars to store. They are single byte characters.
Why won't all be stored and retrieved??????? They are stored as
UTF-8, right, so one byte per character.
>
> Vstring(50) -- this is 50 chars
>> VarChar(50) -- this is 50 chars
> Text -- unlimited
> in this case Valentina allocate on disk bytes
> Vstring(50) -- 50 * 2 = 100 bytes on disk
> VarChar(50) -- 4KB+ pages.
I must say I don't like this at all. I, too, prefer to think in bytes.
It seems for UTF-16 the 50 means characters, but for UTF-8 it means bytes.
It should be the same for all encodings -- having it differ between
encodings is *very* confusing and will certainly lead to many
misunderstandings.
I suggest that 50 always means bytes, and if you want to store 50
UTF-16 characters that you declare a VString of [100]/
Jon
More information about the Valentina-beta
mailing list