IndexbyWords // Vstring.SplitToWords( text as string)
jda
jda at his.com
Sun Nov 20 12:08:25 CST 2005
> >
>>> Unfortunately it turns out that "words" as defined by SplitToString include
>>> "." and exclude "_".
>>
>> I believe they follow natural language rules.
>>
>>> The later is unfortunate for my application. The
>>> former seems to be unfortunate for most every application I can think of
> >> that might want to otherwise use IndexedByWords.
> >
I really hate to contribute to this endless series of messages, but
Ruslan, be careful what you do. The ICU library is smart and ignores
periods if they end words, but not if they are in the middle of
"words". The last time I tested this, "foo." becomes "foo", but
"foo.bar" stays as "foo.bar". I assume this is mostly done so that
decimal numbers aren't split into two (21.50 isn't split into 21 and
50).
Jon
More information about the Valentina
mailing list