Support mapping of multi-word Synonym at Query time.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Support mapping of multi-word Synonym at Query time.

Paras Lehana
Hi Community,

I know there had been many blogs about multi-term synonyms. I have been
reading a lot about this and I'm here just to take suggestions or know what
you guys are doing. The information in the blogs (SynonymFilter) could be
old and there might be better methods now (SynonymGraphFilter).

For illustrative purposes, let my synonym file only contains
(expand=true): fountain pin, fountain pen

What I have understood by researching over this (also, including what I
want to achieve):

   1. SynonymFilter doesn't handle multi-word at index time properly but
   SynonymGraphFilter does. So, docs with "fountain pin" are indexed (also) as
   "fountain pen" and vice versa. Also, docs with "pin" doesn't get indexed
   with "pen" which is what I want.

   2. At query time, "fountain pin" will match with "fountain pen" which is
   cool. But query with only "pin" will also match "fountain pen". Here, I
   want to match "fountain pen" with query "pen" but not with "pin" obviously.

   3. One way could be to use sow=false. But if I use sow (splitOnWhitespace
   as false, I will need to use SynonymGraphFilter at query time too, right?

   4. I prefer not to use synonym analysis at query side. I work for
   Auto-Suggest and in no case, I want to increase my response time from 25
   ms. I don't know how can query time reading of synonym file can impact the
   QTime so I'm open here for criticism. Besides this, in many blogs it's
   recommended not to use synonyms at query time.

   5. Can I achieve what I want with autoGeneratePhraseQueries
   We already use eDismax so here is some tweaking
   in queries for handling the problem.

   6. There also seems to be a custom parser synonym_edismax
   Any experience with that? IndiaMART Product Search team already uses it.
   I'm exploring better ways if there's any.

   7. Also, Lucidworks recommended usage of Auto Phrasing TokenFilter

In short, I'm currently using SynonymGraphFilter. If I don't complicate
things and go with (1,2), I'm planning to have another copyField to index
synonyms and then boost it with something lower than the main field so that
"fountain pen" doesn't boost much with "pin" given that other "pin" docs
are more relevant.


*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

NEVER share your IndiaMART OTP/ Password with anyone.