IR, inverted-index and lists-join
Ruslan Zasukhin
sunshine at public.kherson.ua
Fri Mar 14 01:02:08 CST 2003
on 3/13/03 10:39 PM, Tonio Virgilio LEVRA at levra at yahoo.com wrote:
Hi Tonio,
> I need to do a search engine for combined full-text
> and keyword filtered query but I've a little
> experience on db and no experience on search engine,
> information retrieval, etc.
>
> My source flat file has 65.000 records with one big
> text field (up to 35000 chars) and at least 8 fields
> of filtering/describing keywords(categories), more
> other descriptive fields.
> The biggest of this 5 keywords field can have till 10
> different keywords each record and the possibile
> values for this keywords could be max 800.
>
> Looking for a solution I'm wondering to do in this
> way:
> 1. Build an inverted index for the big text field
I have not understand this point.
You want build self index ???
Why? Valentina do self this task.
> 2. Build 5 different inverted index for the 5
> categories field
> 3. Store the 6 inverted index in different table on
> the db
> 4. store the flat file on the db in one table
>
> and then
>
> a. query for the 6 inverted index according to the
> end-user category filtering and text search.
>
> b. store the six resulting lists of records in a temp
> table linked to the big one (see point 4).
>
> c. query for duplicate of the temp (because I need to
> match only the records part of all the six lists)and
> extract as result the records of the big table (see
> point 4).
>
> What do you think about, it makes sense?
May be I do not understand something but why not simple do
WHERE category1 = 'word' and category2 = 'word2'
You need Make this category fields as Index By Words of course.
I think your way will not be faster of this one.
--
Best regards,
Ruslan Zasukhin [ I feel the need...the need for speed ]
-------------------------------------------------------------
e-mail: ruslan at paradigmasoft.com
web: http://www.paradigmasoft.com
To subscribe to the Valentina mail list go to:
http://listserv.macserve.net/mailman/listinfo/valentina
-------------------------------------------------------------
More information about the Valentina
mailing list