Mantis #0005533 -- NEW in 4.9.1 -- Normalization of Unicode Text // Blog, Wiki..

Ruslan Zasukhin ruslan_zasukhin at valentina-db.com
Thu Jul 21 05:29:17 CDT 2011


On 7/20/11 7:47 PM, "jda" <jda at his.com> wrote:

Hi Jon,

> Hi Ruslan,
> 
> Glad to see that this issue will be resolved in the next update.

Yes me too.
I have spend about 10 days about it ...

And while I did work with your db, except of this issue,
We have improve other things in vstudio and engine.
That it good :)
 

> But I have one question for you...
> 
> 1. Valentina must convert RB UTF-8 to UTF-16 when it writes.

Yes. 

> 2. I assume that composed UTF-8 -> composted UTF-16 and decomposed
> UTF-8 -> decomposed UTF-16.
> 
> That's where the problem comes from, right? Decomposed UTF-8?

Probably.

FACT IS: 
    UTF-16 string which we get after Converter_UTF8_to_UTF16  Is decomozed.

And FACT IS that only few records in your db was such.
Others was composed normal ... correct ...


-------
> If so, why not simply have Valentina convert decomposed UTF-8 to
> composed UTF-16 when it writes?

You know how todo this? :-)

ICU give us just convert from UTF8 to UTF16.
Nothing here about   ALSO do normalization.

Normalization can be made later, when we have UTF16 strings.
ALL modifications of unicode string always happens only on UTF16 in ICU.


> If the character is already composed
> this should take no extra time (or so little it won't matter). That
> would be true in 99.99% of cases. Only the rare decomposed UTF-8
> (entered via copy/paste from a web page, for example) would need
> conversion.
 
> So why can't Valentina take care of making sure all *new* text
> entered is composed when stored? (Old databases would need updating,
> of course). Am I missing something?

Yes, this CAN BE ADDED also into engine ...
As next step, Jon.


What worry me, is that IBM db you and me have read docs about.
They do NOT set flag Normalize ondefault for a field.

Yes, we can try add such flag for out VarChar/Text fields also.
Yes of course...

But as next step ...
And I think we also should choose default this flag OFF...

=============
I already have told why -- because e.g. GUI / WEB  and any other app, which
have user-front end  should  normalize text SELF   immediately when user
type or copy paste it into Edit field.

This do Finder and Safary.

And then  db-engine will get already normalized strings.
So why to have flag ON.


============
Right now our position is next:

* we have give you tools that allow you FIX existed dbs quite easy.
>From Vstudio or at runtime from e.g. RB code.

You can use this tools even todo self  pre-normalization
YOU CAN do this now


And you should improve OWN GUI in your app to close that hole also.





-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the Valentina mailing list