Index by words and apostrophes
Ruslan Zasukhin
sunshine at public.kherson.ua
Thu Nov 30 11:53:14 CST 2006
On 11/30/06 12:45 AM, "Pierre Rossel" <agora07 at prossel.com> wrote:
Hi Pierre,
I have CC your question to ICU list also to get the best answer.
What I think is:
Valentina give you access to 7-9 parameters of Locale.
I think that if French have some special rules, then if you set correct
settings, ICU will do correct job.
----
For info of ICU list: in Valentina we use just WordBreakIterator which
search for boundaries of tokens according to current Locale settings.
> Hello,
>
> I have noticed that the apostrophe character is not considered as a word
> separator when a text field is indexed by words.
>
> If the text contains a sentence such as "The lion's pride", a search on the
> exact word "lion" won't match.
>
> In French, the same sentence would translate to "L'orgueil du lion". Notice
> the apostrophe which separates "L" from "orgueil". "L" means "The" in
> English. A search on "orgueil" won't match the record as "L'orgueil" is
> considered as one word.
>
> I have reported this as a bug in Mantis
> http://www.valentina-db.com/bt/view.php?id=2008
>
> This is a real problem for me as some words cannot be found by my search
> engine if they are next to an apostrophe.
>
> By the way, I have tried to search the word "orgueil" in the text "L'orgueil
> du lion" with several word processing applications and text editors, with
> the option "whole words only". They all found it, so they all consider the
> apostrophe as a word separator. Why should Valentina behave otherwise ?
>
> What do other developers think ?
> Apostrophe is part of a word or not ?
--
Best regards,
Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc
Valentina - Joining Worlds of Information
http://www.paradigmasoft.com
[I feel the need: the need for speed]
More information about the Valentina
mailing list