Question to French people about 'Œuvres' and 'Oeuvres'

Ruslan Zasukhin ruslan_zasukhin at valentina-db.com
Fri Oct 26 15:51:16 CDT 2012


On 10/26/12 11:12 PM, "Thorsten Hohage" <thohage at genericobjects.de> wrote:

> Hi,
> 
> On 2012-10-26, at 21:45, Robert Brenstein <rjb at robelko.com> wrote:
> 
>> On 26.10.2012 at 22:39 Uhr +0300 Ruslan Zasukhin apparently wrote:
>>> Hi Guys,
>>> 
>>> I did work on this bug report
>>>    http://www.valentina-db.com/bt/view.php?id=5780
>>> 
>>> And finally come to understanding that issues comes from very special case:
>>>  these two French words  '‘uvres'  and 'Oeuvres'
>>> 
>>> Are the same.  ICU collators says they are equal.
>>> 
>>> But they have different length. 6 and 7 chars.
>>> So our guess in code that we can compare at first lengths was wrong.
>> 
>> There are more such things in various languages not only French
>> 
>> http://en.wikipedia.org/wiki/Typographic_ligature
>> 
>> Many language handling programs also recognize that ü = ue for example.
> 
> I'm German, so my information may be wrong …
> 
> The situation becomes even more worse. AFAIK Sweden decided to simplify the
> handling of Umlauts in digital media, so they NOW define
> 
> Göteborg
> 
> to become
> 
> Goteborg
> 
> by simply replacing all Umlauts with the vowels without dots. So it is not
> always the rule Ü=ue but it depends on locale setting.
> 
> 
> Furthermore the decomposition of Umlauts can cause more issues. In German
> there are several given sort orders and in some cases "Fü" comes behind "Fuz"
> and is not handled like "Fue", btw historical they are at the end  x, y, z, ä,
> ö, ü … really strange :)

Right, guys, I know that for many languages exists that special chars,
That for some languages sorting of words differs from order of chars.

This is why IBM ICU library is so big 10-12MB
And this is why we use it in Valentina DB.

But all this German and other words so far did work fine with our algs.

Here anomaly is that

   compare( Œuvres, Oeuvres )  says EQUAL.

But when we do search
     START WITH ( Œuvres, 6 ),

It compares  6 chars, and already returns NOT equal.


Maybe this is okay for French.
I just do not see how THIS can be solved.




-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the Valentina mailing list