Valentina 2.0. -- What is your 3 DREAM features? // UTF

Ruslan Zasukhin sunshine at public.kherson.ua
Sat Jan 31 20:35:15 CST 2004


on 1/31/04 9:18 AM, Eric Forget at forgete at cafederic.com wrote:

> If I may add my grain of salt on this, I will go further: let the user of
> the database (us) choose. It could be for the whole database, i.e. not
> necessarily at field level. But as always the more flexibility the better...
> 
> Here's why:
> 
>   UTF-8 takes 1 to 4 bytes
>   UTF-16 takes 2 to 4 bytes
>   UTF-32 takes 4 bytes
> 
> Now imagine that you have a field where you allow the user to enter up to 64
> characters. Quiz: How much space you have to keep for the field in the
> database for each of the 3 encodings?
> 
>   Encoding            For Americans       For International
>   --------            -------------       -----------------
> 
>    UTF-8                  64                  256
>    UTF-16                 128                 256
>    UTF-32                 256                 256
> 
> So, as we can see, as soon as we develop a software for international, we
> need to keep 4 bytes per character anyway. The only difference will be where
> the blanks are put: at the end or in the middle of the string.
> 
> However, for all comparisons UTF-32 will be faster since there is no
> conversion of the characters to be done. UTF-8 4 bytes character is more
> costly for comparisons than UTF-16 4 bytes character which is also more
> costly than UTF-32. Unfortunately, most of the Unicode implementations
> support only UTF-8 and UTF-16.
> 
> So, for today the ideal is to use UTF-8 for Americans (or French, etc.) and
> use UTF-16 everywhere else.

Yes, Eric, we perfectly understand all this.

> Once, implementations will be added for UTF-32,
> it will become useful to switch to it. UTF-32 is the only native encoding.
> It has once been UTF-16 but it changed lately...

And we all must pay less attention then to size of db.

HDDs becomes faster and bigger.
CPUs faster.

BTW, Note OS X, Wndows, and I think Linux, all use UTF16 as native
 
> Since it is a new code base, why not start going with the flexibility...

-- 
Best regards,
Ruslan Zasukhin      [ I feel the need...the need for speed ]
-------------------------------------------------------------
e-mail: ruslan at paradigmasoft.com
web: http://www.paradigmasoft.com

To subscribe to the Valentina mail list go to:
http://lists.macserve.net/mailman/listinfo/valentina
-------------------------------------------------------------



More information about the Valentina mailing list