Unicode and length of field

jda jda at his.com
Mon May 12 12:20:53 CDT 2003


>
>-- for UTF8 String[N] / VarChar[N]
>     N will play role of length in bytes.
>     Deal is that if we store here chars >127 then they can take 2 bytes.
>     but some chars still can use one byte.
>     We get mess...

Actually, in UTF-8 characters can  take up to 4 bytes (I *think* 
that's the limit, but perhaps it's even more). The ellipsis character 
(Š) for example, takes 3 bytes.

>
>
>2) To make things more consistence, we can require for UTF16 length also in
>bytes. Then if you need 30 chars you say String[60] and Valentina self add 2
>bytes for END ZERO.
>In this way both Unicodes, UTF8 and UTF16 play by the same rules, that
>differ from old Strings.
>

It seems to me that the best way to handle this is to put the burden 
on the user/programmer to set the number of bytes, and have Valentina 
allocate that amount regardless of text encoding. But I'd certainly 
be curious to hear what others think.

Jon


More information about the Valentina mailing list