Unicode and length of field

Mon May 12 12:20:53 CDT 2003

>
>-- for UTF8 String[N] / VarChar[N]
>     N will play role of length in bytes.
>     Deal is that if we store here chars >127 then they can take 2 bytes.
>     but some chars still can use one byte.
>     We get mess...

Actually, in UTF-8 characters can  take up to 4 bytes (I *think* 
that's the limit, but perhaps it's even more). The ellipsis character 
(Š) for example, takes 3 bytes.

>
>
>2) To make things more consistence, we can require for UTF16 length also in
>bytes. Then if you need 30 chars you say String[60] and Valentina self add 2
>bytes for END ZERO.
>In this way both Unicodes, UTF8 and UTF16 play by the same rules, that
>differ from old Strings.
>

It seems to me that the best way to handle this is to put the burden 
on the user/programmer to set the number of bytes, and have Valentina 
allocate that amount regardless of text encoding. But I'd certainly 
be curious to hear what others think.

Jon