IR, inverted-index and lists-join

Tonio Virgilio LEVRA levra at yahoo.com
Fri Mar 14 09:19:06 CST 2003


> I need to do a search engine for combined full-text
> and keyword filtered query but I've a little
> experience on db and no experience on search engine,
> information retrieval, etc.
> 
> My source flat file has 65.000 records with one big
> text field (up to 35000 chars) and at least 8 fields
> of filtering/describing keywords(categories), more
> other descriptive fields.
> The biggest of this 5 keywords field can have till
10
> different keywords each record and the possibile
> values for this keywords could be max 800.
> 
> Looking for a solution I'm wondering to do in this
> way:
> 1. Build an inverted index for the big text field

I have not understand this point.
You want build self index ???
Why? Valentina do self this task.

> 2. Build 5 different inverted index for the 5
> categories field
> 3. Store the 6 inverted index in different table on
> the db
> 4. store the flat file on the db in one table
> 
> and then
> 
> a. query for the 6 inverted index according to the
> end-user category filtering and text search.
> 
> b. store the six resulting lists of records in a
temp
> table linked to the big one (see point 4).
> 
> c. query for duplicate of the temp (because I need
to
> match only the records part of all the six lists)and
> extract as result the records of the big table  (see
> point 4).
> 
> What do you think about, it makes sense?

> May be I do not understand something but why not
simple do

>     WHERE category1 = 'word' and category2 = 'word2'

> You need Make this category fields as Index By Words
of course.
> I think your way will not be faster of this one.

Well, because I have more then one word for each
category, and if the end use want to filter more then
one word for each category the select could be hard to
build and long to query and above all because I need
to fill the tree list of the category with the number
of record for every word listed in the category

(-word1 [10]
   .word2 [1]
   .word3 [0]
   ...
 -word4 [7]
  .... )

Tonio

__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online
http://webhosting.yahoo.com


More information about the Valentina mailing list