V4RB, Jon, project /// More explanation.

jda jda at his.com
Wed Sep 15 14:19:11 CDT 2004


>  say I define some field as UTF8 10 chars maxlenght
>  >> I store some 10 chars string with no double bytes chars ("abcdefghij")
>>>  the Vale kernel converts all in UTF16
>>>
>>>  result: I loose half my data
>>>
>>>  I got it well?
>>>
>>
>>  No -- at least that's not how I understand it.
>
>Yes, Jon, you loose.
>
>On disk is reserved 10 bytes for Vstring field.
>So if you put 10 chars each 2 bytes then this is 20 bytes.
>On disk can be stored only 10 bytes.
>
>
>If, Erne, you will use VarChar(504), then all your 20 bytes will be written
>to disk. No problems.
>

OK, I'm stupid, but I just don't get your explanation.

He defined the field as storageEncoding UTF-8, and set max length to 
10. He has 10 chars to store. They are single byte characters.

Why won't all be stored and retrieved??????? They are stored as 
UTF-8, right, so one byte per character.



>
>   Vstring(50)     -- this is 50 chars
>>         VarChar(50)     -- this is 50 chars
>         Text            -- unlimited

>    in this case Valentina allocate on disk bytes

  >       Vstring(50)     -- 50 * 2 = 100 bytes on disk
  >       VarChar(50)     -- 4KB+ pages.

I must say I don't like this at all. I, too, prefer to think in bytes.

It seems for UTF-16 the 50 means characters, but for UTF-8 it means bytes.

It should be the same for all encodings -- having it differ between 
encodings is *very* confusing and will certainly lead to many 
misunderstandings.

I suggest that 50 always means bytes, and if you want to store 50 
UTF-16 characters that you declare a VString of [100]/



Jon


More information about the Valentina-beta mailing list