Unicode and length of field

Ruslan Zasukhin sunshine at public.kherson.ua
Mon May 12 20:36:30 CDT 2003


on 5/12/03 7:20 PM, jda at jda at his.com wrote:

>> -- for UTF8 String[N] / VarChar[N]
>>     N will play role of length in bytes.
>>     Deal is that if we store here chars >127 then they can take 2 bytes.
>>     but some chars still can use one byte.
>>     We get mess...
> 
> Actually, in UTF-8 characters can  take up to 4 bytes (I *think*
> that's the limit, but perhaps it's even more). The ellipsis character
> (Š) for example, takes 3 bytes.

It seems this is if to think about USC-4.
For 2-bytes USC-2 ... Aha, yes. Probably can be 3 bytes also...

>> 2) To make things more consistence, we can require for UTF16 length also in
>> bytes. Then if you need 30 chars you say String[60] and Valentina self add 2
>> bytes for END ZERO.
>> In this way both Unicodes, UTF8 and UTF16 play by the same rules, that
>> differ from old Strings.
>> 
> 
> It seems to me that the best way to handle this is to put the burden
> on the user/programmer to set the number of bytes, and have Valentina
> allocate that amount regardless of text encoding. But I'd certainly
> be curious to hear what others think.

Yes, than more that we can START CONSIDER for normal fields String[N] also
as bytes length. And this is also true.

-- 
Best regards,
Ruslan Zasukhin      [ I feel the need...the need for speed ]
-------------------------------------------------------------
e-mail: ruslan at paradigmasoft.com
web: http://www.paradigmasoft.com

To subscribe to the Valentina mail list go to:
http://lists.macserve.net/mailman/listinfo/valentina
-------------------------------------------------------------



More information about the Valentina mailing list