lowering score of doc if synonyms matched (synonyms indexed)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

lowering score of doc if synonyms matched (synonyms indexed)

zzzzz shalev
i am currently adding synonyms at index time (and not expanding the query), i fear that there is a problem with this implementation:
   
  is there a way to lower the score of a document if it was found due to a synonyms match and not due to a match of the word queried. from what i understand the synonyms are indexed with the same placement as the original word which may make this impossible?
   
  thanks,
   
   

               
---------------------------------
Blab-away for as little as 1¢/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.
Reply | Threaded
Open this post in threaded view
|

RE: lowering score of doc if synonyms matched (synonyms indexed)

Ziv Gome
As you might have already seen, Andrew Schetinin and I have published (at http://mail-archives.apache.org/mod_mbox/lucene-java-user/200603.mbox/%3c39B0FB508E5D7540ACA5AD57225E150D39203D@...%3e) a source code that handles synonyms at search time (query expansion).

This code includes also a de-boost factor for synonyms (compared with root term). It also fixes the distortion created by IDF relationships of the root terms and their synonyms. The way it produces the score is to simulate an aggregated frequency from frequencies of all synonym terms in each document, and then constructs a score for the "joint frequency.

I realize you are asking about processing synonyms at index time, but note that terms injection does not allow you to (in addition to the de-boost issues you raise in your post):
a) Change the synonym dictionary, once index is built.
b) Change the boost factor once index is built.
c) enable/disable the option of using synonyms (e.g. some applications has an "exact match" feature, or the client simply doesn't want to drift from "car" to "auto").


BTW, for reply please use ziv.gome_gmail_com (replace "_" where appropriate)

Thanx,
Ziv Gome

-----Original Message-----
From: zzzzz shalev [mailto:[hidden email]]
Sent: Wednesday, May 10, 2006 10:36 AM
To: [hidden email]
Subject: lowering score of doc if synonyms matched (synonyms indexed)

i am currently adding synonyms at index time (and not expanding the query), i fear that there is a problem with this implementation:
   
  is there a way to lower the score of a document if it was found due to a synonyms match and not due to a match of the word queried. from what i understand the synonyms are indexed with the same placement as the original word which may make this impossible?
   
  thanks,
   
   

               
---------------------------------
Blab-away for as little as 1¢/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]