[V4RB] Re: Unicode workaround via UTF-16 - problems
Dave Addey
dave.addey at dsl.pipex.com
Fri Nov 7 10:22:10 CST 2003
Hi Ruslan,
> Dave,
>
> I think problem not in 4 bytes chars.
> As I have read some accent chars (e,g in German a' can be expanded into 2
> chars where we do sorting)
>
> This means that your way of sorting SOMETIMES can give glitches.
Oh, I see what you mean. Hmm. Well, I'll give it a go, and see how I get
on - i.e. See if the sorting is good *enough* until v2.0 of Valentina :-)
>> So, I've switched to using VVarBinary rather than VVarChar for fields where
>> I want to store my UTF-16 data. My problem comes when I try and sort on
>> these fields.
>
> Ops, it seems I have give you wrong info.
> Binary can fields CAN store data.
>
> But since this is BINARY fields, Valentina DO NOT index then, and there fore
> cannot sort. Because sort algorithm of Valentina use index.
>
> Hmm, then I afraid we do not have a way.
> String field cannot store data with ZERO inside.
> Binary cannot sort...
Drat! Well, it looks like my only other option then is to encode the UTF-16
strings before I put them in the database. Let me think about this...
I guess that if I'm being inaccurate about my sorting (and "living" with
this fact for now), then I could just use UTF-8 instead (which doesn't
contain these zero bytes). UTF-8 uses qualifiers for two-byte characters,
so strings which contained these wouldn't sort properly. But, if I'm happy
to live with sorting where two-byte chars appear *after* one-byte chars in
all cases, then this isn't a problem. It makes my sorting less accurate,
but maybe I can live with this for now!
Alternatively, I could encode my UTF-16 strings so that the zero-byte
character isn't there. Is it *just* zero-byte (&h00) that isn't allowed in
Valentina strings? If so, I could replace all these with a Unicode control
character such as &hFF (as I know the string is Unicode, and I created the
string, so the control characters won't appear in the string otherwise.)
Are there any bytes other than &h00 that aren't allowed in VVarChar?
Third option: I encode *all* bytes of my UTF-16 string into a longer format
(e.g. replace each byte with its hex equivalent, such that &h00 becomes
"00". This would double my storage requirements for these strings
(actually, the multiplier is x4 because I'm encoding UTF-16), but would
definitely work!
Thanks in advance for your help,
Dave.
More information about the Valentina
mailing list