Unicode workaround via UTF-16 - problems

Ruslan Zasukhin sunshine at public.kherson.ua
Wed Nov 5 20:10:09 CST 2003


on 11/5/03 7:53 PM, Dave Addey at dave.addey at dsl.pipex.com wrote:

> Hi all,
> 
> I've read on this list about Unicode coming in v2.0, and I look forward to
> when it does.
> 
> In the meantime, I've been trying a workaround.  I'd like to store my UTF-16
> strings in Valentina, without Valentina knowing or caring that they are in
> this format.  I use REALbasic 5, and can convert back and forth between text
> encodings at will, so I'm able to take a random set of bytes, of known
> string encoding, and use this string in REALbasic.

Then I think you need use VarBinary or FixedBainry strings

> In theory, byte-based sorting on an UTF-16 string (stored and referenced as
> bytes) is a valid sorting process.

I think this is not correct, Dave.

>From what I have read in the docs of IBM ICU library, sorting on bytes DO
NOT works. IBM guys at first convert string into special form that can be
sorted.

> Because I can pretty much always assume that each character takes 2 bytes in
> UTF-16, I'm happy that any sort I do will come back accurately enough sorted
> for my liking.  Sorting would only 'break' if my string contained characters
> which are outside of the UTF-16 set, and this is unlikely in normal usage (in
> my opinion).

I still think this will not work for SOME HARD languages.


> But, my problem is this.  When I try and store a Unicode UTF-16 string in a
> Valentina string field, I can't do so for any strings that begin with the
> null character (&h00).  Strings that start with other characters are fine
> (e.g. Japanese characters which use both bytes in UTF-16).  And since most
> strings in Western alphabets contain mostly ASCII characters, most of my
> UTF-16 strings begin with &h00 .

This is why I say to use FixedBianry and VarBinary fields.


> So, for example:
> 
>   LibraryDB.ImportedSongsTable.Album.Value = mysong.Album
> 
> ...where LibraryDB.ImportedSongsTable.Album is a VVarChar, and mysong.Album
> is a UTF-16 string.
> 
> In this example, if the first byte of mysong.Album is &h00, the value of
> LibraryDB.ImportedSongsTable.Album.Value is nil, even though there are other
> characters after the initial &h00 value in mysong.Album .
> 
> All I really want is to transfer some bytes into a Valentina field (I'm
> assuming this would be a string) and get them back out again, with byte
> sorting on this field.  Is there a way to do this in the current release?
> 
> This would allow me to add Unicode support to my app before it is available
> 'native' in Valentina.
> 
> BTW, I'm using Valentina 1.9.7, REALBasic 5.2.1 on Mac OS X 10.2.8.

-- 
Best regards,
Ruslan Zasukhin      [ I feel the need...the need for speed ]
-------------------------------------------------------------
e-mail: ruslan at paradigmasoft.com
web: http://www.paradigmasoft.com

To subscribe to the Valentina mail list go to:
http://lists.macserve.net/mailman/listinfo/valentina
-------------------------------------------------------------



More information about the Valentina mailing list