V4RB, Jon, project

Tue Sep 14 17:50:00 CDT 2004

>
>  >
>>  Why? UTF-8 is the "native" format for RB.
>
>I new you will say this.

Of course you did! :->

>
>>  It also, for Western
>>  languages, usually requires only a little more storage than UTF-16.
>>  Then what makes it a bad choice for Valentina Developers (especially
>>  RB ones)?
>
>Right, and for e.g. Cyrillic it will eat 2 bytes per char.

And for Japanese even more than 2. But I'd like to be my choice...

>
>So what we get then?
>
>     If they all make Vstring(50) as UTF16
>     then they all can store 50 chars.
>
>     German/USA developer make Vstring(50) as Latin1
>         he can store here really 50 chars of English or German
>
>     Russian developer make Vstring(50) as Cyrilic-win
>         he can store here only 50 chars.
>
>     German/USA developer make Vstring(50) as UTF8
>         he can store here really 50 chars of English or German
>
>     Russian developer make Vstring(50) as UTF8
>         he can store here only 25 chars.    <<<<<<<<<<< OPS
>
>Non - consistence.

>
>We want and we think this is correct to write in docs
>
>     Vstring( MaxCharsCount )
>
>
>Problem of UTF8 is that can have variable length of bytes per chars.
>We cannot guarntee to you that if you make UTF8 Vstring(50)
>Then you will be able store here 50 chars in any language.
>
>At last of end, why we use unicode?
>To be able store any language.
>
>If you want store only German or only English then use Latin1.
>If you really want store any language then use UTF16.

Because for some of us we do not know in advance what users will want 
to store, but *most* will be some variant of a Western language. If 
hits to my web site are any indication, 90% or more are using a 
Western language primarily, and 10% or so use Japenese (primarily). 
But many Western language users mix in the occasional Japenese, 
Greek, Hebrew, or whatever.

>
>
>We have discuss this deeply here.
>Vstring -- cause the most big problem for UTF8
>VarChar -- so so. IF you will write strings close to max limit you again may
>not fit into declared size.
>Vtext -- do not have problems.
>

Any reason, in principle, that we can't mix encodings in a single 
database -- use UTF-18 for VStrings and UTF-8 for VText. Does this 
have to be database-wide -- can't it be field-specific, like language 
is now?

Anyway, if you are saying this as a warning, but we can still use 
UTF-8 if we want, then point taken, and thanks.

Jon