FW: [icu-support] Help please on string compare with accents

Ruslan Zasukhin ruslan_zasukhin at valentina-db.com
Thu Aug 7 01:02:13 CDT 2014



Hi, By the way, you can try ICU collation using
<http://demo.icu-project.org/icu-bin/collation.html>  - you can just use
'root' (und) for de_DE, it's the same.


------ Forwarded Message
From: Ruslan Zasukhin <ruslan_zasukhin at valentina-db.com>
Reply-To: ICU support mailing list <icu-support at lists.sourceforge.net>
Date: Thu, 07 Aug 2014 07:52:08 +0300
To: ICU support mailing list <icu-support at lists.sourceforge.net>
Cc: Ivan Smahin <ivan_smahin at valentina-db.com>
Subject: Re: [icu-support] Help please on string compare with accents

On 8/7/14, 3:03 AM, "Steven R. Loomis" <srl at icu-project.org> wrote:

Hi Steven,

We think so, because we user told us, and we have test few apps.
For example Xojo/REALbasic, TextMate.

TextMate sorts lines

smorebröd:test
smörebröd 
smorebröd:test
smörebröd 
smorebröd:test
smörebröd 

Into 

smorebröd:test
smorebröd:test
smorebröd:test
smörebröd 
smörebröd 
smörebröd 


We guess that all these MAC OS X Cocoa apps use Apple's  NSCompare methods,
Which is wraper to ICU.   OS X uses ICU.

But we cannot get such compare result in our application.
We have try all possible combinations.
This is why we sooo wonder.


> Ruslan,
>  Why should  "smorebröd:test" < "smörebröd"  ?  German doesn't tailor ö
> as a primary difference, so smorebröd:test sorts after.

Steven, but why in this case, when length equal and only one char differs
ICU says  LESS?

>> // "smorebröd"        < "smörebröd" - correct.


Then we append ":test" to LESS string and it becomes bigger?

>> // "smorebröd:test" < "smörebröd" - not, but should be.


 
>  In, say, Swedish (sv) then you would get the results your test expects.
> 
>  P.S. you want to make sure to test the error result ( if
> U_FAILURE(status) ) and not to overwrite it with U_ZERO_ERROR
> afterwards. I've attached an updated version of the test code.
> 
> -s
> 
> 
> On 08/06/2014 03:45 PM, Ruslan Zasukhin wrote:
>> Hi all,
>> 
>> One user have complain that our software do wrong compare of 2 strings.
>> We have narrow this down to simple ICU code that reproduce problem.
>> We also see that other software on our Macs works as user expects.
>> 
>> We have try all possible combinations for collator.  But nothing helps.
>> Can anybody point where is problem?
>> 
>> Also I have note that ucol_open( "de_DE" ), returns UCOL_NORMALIZATION_MODE.
>> 
>> 
>> //
>> // "smorebröd"        < "smörebröd" - correct.
>> // "smorebröd:test" < "smörebröd" - not, but should be.
>> //
>> 
>> THIS IS our code as simple as possible.
>> 
>> 
>> void Test_ICU_Collator( void ) throw()
>> {
>>     TEST_NAME( "Test_ICU_Collator" );
>> 
>> 
>>     const UChar val0[] =
>>         {
>>             UChar('s'),
>>             UChar('m'),
>>             UChar('o'),
>>             UChar('r'),
>>             UChar('e'),
>>             UChar('b'),
>>             UChar('r'),
>>             UChar(0xF6),    // ö
>>             UChar('d')
>>         };
>>         
>>     const UChar val1[] =
>>         {
>>             val0[0],        // s
>>             val0[1],        // m
>>             val0[2],        // o
>>             val0[3],        // r
>>             val0[4],        // e
>>             val0[5],        // b
>>             val0[6],        // r
>>             val0[7],        // ö
>>             val0[8],        // d
>>             UChar(':'),
>>             UChar('t'),
>>             UChar('e'),
>>             UChar('s'),
>>             UChar('t')
>>         };
>>         
>>     const UChar val2[] =
>>     {
>>         val0[0],            // s
>>         val0[1],            // m
>>         UChar(0xF6),    // ö
>>         val0[3],            // r
>>         val0[4],            // e
>>         val0[5],            // b
>>         val0[6],            // r
>>         val0[7],            // ö
>>         val0[8],            // d
>>     };
>> 
>>     ::UErrorCode  status = ::U_ZERO_ERROR;
>>     UCollator* pCollator = ucol_open( "de_DE", &status );
>> 
>> 
>>     status = ::U_ZERO_ERROR;
>>     ucol_setAttribute(
>>         pCollator,
>>         ::UCOL_STRENGTH,
>>         ::UCOL_SECONDARY,
>>         &status );
>> 
>> 
>> 
>>     ::UCollationResult res = ucol_strcoll(
>>                                 pCollator,
>>                                 val0,
>>                                 sizeof(val0)/sizeof(UChar),
>>                                 val2,
>>                                 sizeof(val2)/sizeof(UChar) );
>>     DO_TEST( res == UCOL_LESS );
>> 
>>     res = ucol_strcoll(
>>                                 pCollator,
>>                                 val1,
>>                                 sizeof(val1)/sizeof(UChar),
>>                                 val2,
>>                                 sizeof(val2)/sizeof(UChar) );
>>     DO_TEST( res == UCOL_LESS );
>>     
>>     ucol_close( pCollator );
>> }

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



----------------------------------------------------------------------------
--
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
icu-support mailing list - icu-support at lists.sourceforge.net
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
Archives/Project Info: http://site.icu-project.org/contacts


------ End of Forwarded Message




More information about the Valentina mailing list