IndexStyle

jda jda at his.com
Wed Nov 3 09:44:20 CST 2004


>>
>>
>>-- I think may be Valentina should on default use style with length limit at
>>least 2 or 3 or may be even 4?  Or better put this on developer ?
>
>I'd be inclined to want everything indexed unless I as a developer 
>explicitly overrode that behavior.  But I suppose that having a 
>default length limit of 2 or maybe 3 would be OK -- as long as it 
>was very clearly documented and overrideable.

I agree. In the sciences, for example, two letter abbreviations are 
common (Ia, Cu, ad infinitum) and users may want to search for these 
abbreviations. So for my purposes, for example, I'd probably exclude 
only 1 letter words from the index (I, a, etc.). BTW, would this be 
letters or bytes? If one uses UTF-8, 1 letter may be 2-4 bytes.

>
>The main Unicode categories are letters, numbers, punctuation, 
>symbols, marks, separators, and miscellaneous.  Subcategories exist 
>for upper- & lower-case letters, different types of punctuation, 
>etc.  And there are other category "properties" such as whitespace, 
>quotation marks, alphabetic, etc.  All the work has already been 
>done to assign a category and (optionally) properties to each 
>Unicode code point.
>
><http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values>
><http://www.unicode.org/Public/UNIDATA/UCD.html#White_Space> (etc.)
>
>Developers could easily define their break-character categories 
>(e.g., separators, punctuation, and whitespace) and then define what 
>resulting "words" they want indexed (e.g., any "word" whose 
>characters are all letters, every "word" that doesn't include a 
>number, or only "words" made up of only uppercase letters).
>
>Perhaps methods such as:
>
>	style.BreakCategory("separator") = True    // All separators 
>and punctuation
>	style.BreakCategory("punctuation") = True  // are now break characters,
>	style.BreakCategory("number") = False      // but numbers are not.
>
>could work for defining break-character categories.  Defining good 
>Valentina defaults would keep us from having to use these too much, 
>of course, and we'd probably still need to be able to override 
>specific items in the break-character categories -- perhaps with 
>your "SetBreakers" method, above.  Still, the ability to define 
>large groups of characters quickly, without having to worry about 
>whether or not you've left out a character or two, would be quite 
>worthwhile, I think.

Very well said.

Jon


More information about the Valentina-beta mailing list