Unicode and length of field
jda
jda at his.com
Mon May 12 12:20:53 CDT 2003
>
>-- for UTF8 String[N] / VarChar[N]
> N will play role of length in bytes.
> Deal is that if we store here chars >127 then they can take 2 bytes.
> but some chars still can use one byte.
> We get mess...
Actually, in UTF-8 characters can take up to 4 bytes (I *think*
that's the limit, but perhaps it's even more). The ellipsis character
() for example, takes 3 bytes.
>
>
>2) To make things more consistence, we can require for UTF16 length also in
>bytes. Then if you need 30 chars you say String[60] and Valentina self add 2
>bytes for END ZERO.
>In this way both Unicodes, UTF8 and UTF16 play by the same rules, that
>differ from old Strings.
>
It seems to me that the best way to handle this is to put the burden
on the user/programmer to set the number of bytes, and have Valentina
allocate that amount regardless of text encoding. But I'd certainly
be curious to hear what others think.
Jon
More information about the Valentina
mailing list