[V4RB] Re: Unicode workaround via UTF-16 - problems

Ruslan Zasukhin sunshine at public.kherson.ua
Fri Nov 7 11:09:09 CST 2003


on 11/7/03 10:23 AM, Dave Addey at dave.addey at dsl.pipex.com wrote:

> Hi Ruslan,
> 
> Thanks for the response!  I'm still having a few problems...
> 
>> Then I think you need use VarBinary or FixedBainry strings
> 
> This sounds ideal! Thanks.
> 
>>> In theory, byte-based sorting on an UTF-16 string (stored and referenced as
>>> bytes) is a valid sorting process.
>> 
>> I think this is not correct, Dave.
> 
> You're right.  2 bytes is not enough for *all* characters in the world.  And
> characters which require more than 2 bytes would "break" my sorting (as
> UTF-16 uses 4 bytes to store them, with the first 2 bytes as an identifier).
> 
> But, according to IBM...
> 
> "All of the most common characters in use for all modern writing systems
> are already represented with 2 bytes. Characters in surrogate space take
> 4 bytes, but as a proportion of all world text they will always be very
> rare."
> 
> So I should be pretty safe :-)
> 
> This quote is from an excellent article I found at:
> 
> http://www-106.ibm.com/developerworks/library/utfencodingforms/
> 
>> I still think this will not work for SOME HARD languages.
> 
> I agree.  But see the quote above :-)

Dave,

I think problem not in 4 bytes chars.
As I have read some accent chars (e,g in German a' can be expanded into 2
chars where we do sorting)

This means that your way of sorting SOMETIMES can give glitches.


-- 
Best regards,
Ruslan Zasukhin      [ I feel the need...the need for speed ]
-------------------------------------------------------------
e-mail: ruslan at paradigmasoft.com
web: http://www.paradigmasoft.com

To subscribe to the Valentina mail list go to:
http://lists.macserve.net/mailman/listinfo/valentina
-------------------------------------------------------------



More information about the Valentina mailing list