Please confirm it is okay now

Ruslan Zasukhin sunshine at public.kherson.ua
Thu Jul 27 14:12:02 CDT 2006


On 7/27/06 1:58 PM, "Stan Busk" <maxprog at mac.com> wrote:

> Hi,
> 
> I was looking at: http://icu.sourceforge.net/apiref/icu4j/com/ibm/icu/
> text/BreakIterator.html
> 
> In "getWordInstance() returns a BreakIterator..." sentence it says,
> (Numbers count as words, too.) Whitespace and punctuation are kept
> separate from real words.
> 
> Does it means aaa123bbb456 will be indexed (when Indexed by word) as:
> 
> aaa
> 123
> bbb
> 456
> 
> ???

no
    aaa 123 bbb456 

Yes, 123 is separate word


> Also what would happen with ,;:?! ? They are punctuation characters
> but actually the dot '.' doesn't create several words, at least:
> 
> aaa.bbb is indexed (when Indexed by word) as:
> 
> aaa.bbb
> 
> would it be the same with ,;:?!  ?
> 
> 
> Note that it is just out of curiosity....

Look please into Examples/Common/SplitToWords example.

It allow you easy play with this feature.



-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the Valentina mailing list