IndexbyWords

Ruslan Zasukhin sunshine at public.kherson.ua
Mon Nov 14 10:30:35 CST 2005


On 11/14/05 12:20 AM, "Ed Kleban" <Ed at Kleban.com> wrote:

Hi Ed,

>  Ruslan,
> 
> I found in the mail archives where you noted:
>  
>     How Valentina break string on words:
>         again it try do this naturally
>         spaces, punctuations -- all these are breakers.
> 
> Can you be more specific?  In fact can you be absolutely precise please?

Well in Valentina 2.0 we use IBM ICU and its algorithms.

ICU break to words using rules of specified language /locale.
We even do not control this

> For example is "_" part of words, or a separator of words?   How about
> digit-only numbers, do these get indexed as words?

Even not remember :-)

> how about a word
> containing alphabetic characters but beginning with a numeric digit? How
> about the punctuation that separates words... are there any special cases
> where these too are indexed?

Well, it is possible to check all this using V4RB exmaple
    
    Common/SplitToWords
 
> I see no clues to any of these questions in any of the manuals.
> 
> Thanks!
> --Ed
> 
> 

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the Valentina mailing list