Please confirm it is okay now
Ruslan Zasukhin
sunshine at public.kherson.ua
Thu Jul 27 14:12:02 CDT 2006
On 7/27/06 1:58 PM, "Stan Busk" <maxprog at mac.com> wrote:
> Hi,
>
> I was looking at: http://icu.sourceforge.net/apiref/icu4j/com/ibm/icu/
> text/BreakIterator.html
>
> In "getWordInstance() returns a BreakIterator..." sentence it says,
> (Numbers count as words, too.) Whitespace and punctuation are kept
> separate from real words.
>
> Does it means aaa123bbb456 will be indexed (when Indexed by word) as:
>
> aaa
> 123
> bbb
> 456
>
> ???
no
aaa 123 bbb456
Yes, 123 is separate word
> Also what would happen with ,;:?! ? They are punctuation characters
> but actually the dot '.' doesn't create several words, at least:
>
> aaa.bbb is indexed (when Indexed by word) as:
>
> aaa.bbb
>
> would it be the same with ,;:?! ?
>
>
> Note that it is just out of curiosity....
Look please into Examples/Common/SplitToWords example.
It allow you easy play with this feature.
--
Best regards,
Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc
Valentina - Joining Worlds of Information
http://www.paradigmasoft.com
[I feel the need: the need for speed]
More information about the Valentina
mailing list