IndexbyWords // Vstring.SplitToWords( text as string)

jda jda at his.com
Sun Nov 20 12:08:25 CST 2005


>  > 
>>>  Unfortunately it turns out that "words" as defined by SplitToString include
>>>  "." and exclude "_".
>>
>>  I believe they follow natural language rules.
>>
>>>  The later is unfortunate for my application.  The
>>>  former seems to be unfortunate for most every application I can think of
>  >> that might want to otherwise use IndexedByWords.
>  >

I really hate to contribute to this endless series of messages, but 
Ruslan, be careful what you do. The ICU library is smart and ignores 
periods if they end words, but not if they are in the middle of 
"words". The last time I tested this, "foo." becomes "foo", but 
"foo.bar" stays as "foo.bar". I assume this is mostly done so that 
decimal numbers aren't split into two (21.50 isn't split into 21 and 
50).

Jon


More information about the Valentina mailing list