Unicode and length of field
Ruslan Zasukhin
sunshine at public.kherson.ua
Mon May 12 20:36:30 CDT 2003
on 5/12/03 7:20 PM, jda at jda at his.com wrote:
>> -- for UTF8 String[N] / VarChar[N]
>> N will play role of length in bytes.
>> Deal is that if we store here chars >127 then they can take 2 bytes.
>> but some chars still can use one byte.
>> We get mess...
>
> Actually, in UTF-8 characters can take up to 4 bytes (I *think*
> that's the limit, but perhaps it's even more). The ellipsis character
> () for example, takes 3 bytes.
It seems this is if to think about USC-4.
For 2-bytes USC-2 ... Aha, yes. Probably can be 3 bytes also...
>> 2) To make things more consistence, we can require for UTF16 length also in
>> bytes. Then if you need 30 chars you say String[60] and Valentina self add 2
>> bytes for END ZERO.
>> In this way both Unicodes, UTF8 and UTF16 play by the same rules, that
>> differ from old Strings.
>>
>
> It seems to me that the best way to handle this is to put the burden
> on the user/programmer to set the number of bytes, and have Valentina
> allocate that amount regardless of text encoding. But I'd certainly
> be curious to hear what others think.
Yes, than more that we can START CONSIDER for normal fields String[N] also
as bytes length. And this is also true.
--
Best regards,
Ruslan Zasukhin [ I feel the need...the need for speed ]
-------------------------------------------------------------
e-mail: ruslan at paradigmasoft.com
web: http://www.paradigmasoft.com
To subscribe to the Valentina mail list go to:
http://lists.macserve.net/mailman/listinfo/valentina
-------------------------------------------------------------
More information about the Valentina
mailing list