Handling overlapping synonyms

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Handling overlapping synonyms

fiedzia
Having synonyms defined for

new york  -> new_york
new york city -> new_york_city

I'd like the phrase
new york city
to be indexed as both, but SynonymGraphFilter picks only one. Is there a way
around that?

--
Maciej Dziardziel
[hidden email]



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Handling overlapping synonyms

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
If you instead write "new york => new_york, new_york_city" it should work (https://doc.lucidworks.com/fusion/3.1/Collections/Synonyms-Files.html)

On 1/17/20, 6:29 AM, "fiedzia" <[hidden email]> wrote:

    Having synonyms defined for
   
    new york  -> new_york
    new york city -> new_york_city
   
    I'd like the phrase
    new york city
    to be indexed as both, but SynonymGraphFilter picks only one. Is there a way
    around that?
   
    --
    Maciej Dziardziel
    [hidden email]
   
   
   
    --
    Sent from: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=ogoT0t33fiW87_QMoUn_sWWs_DWHiunR_gq1iXkMR8I&s=3mtCduryNf-zp79DbcKRtn2hSOWWtgbmYX4idUg1VB0&e= 
   

Reply | Threaded
Open this post in threaded view
|

Re: Handling overlapping synonyms

fiedzia
> If you instead write "new york => new_york, new_york_city" it should work

I can't do that, as that would turn "new york" into "new york_city", which
is not what I want.
Doing it the other way (new york city -> new_york_city, new_york) makes more
sense, though I expect this to get positions wrong and mess with
highlighting.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Handling overlapping synonyms

fiedzia
> Doing it the other way (new york city -> new_york_city, new_york) makes
more
sense,

Just checked it, that way does the matching as expected, but highlighting is
wrong
("new york: query matches "new york city" as it should, but also highlights
all of it)



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Re: Handling overlapping synonyms

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
Hmm .... what is the reasoning behind adding the bigrams and trigrams manually like that? Maybe if we knew the end goal, we could figure out a different strategy. Happy that at least the matching is working now!

On 1/17/20, 10:28 AM, "fiedzia" <[hidden email]> wrote:

    > Doing it the other way (new york city -> new_york_city, new_york) makes
    more
    sense,
   
    Just checked it, that way does the matching as expected, but highlighting is
    wrong
    ("new york: query matches "new york city" as it should, but also highlights
    all of it)
   
   
   
    --
    Sent from: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=sxUM_HkySPw_KqJdqMGkjWQyUQ6W7K44Nid7p7wcBJ4&s=rJFkuEpTxkPp6EtyRstEE3PWCY-CSAmtjOFJ9ge67uU&e= 
   

Reply | Threaded
Open this post in threaded view
|

Re: Re: Handling overlapping synonyms

fiedzia
> what is the reasoning behind adding the bigrams and trigrams manually like
that? Maybe if we knew the end goal, we could figure out a different
strategy. Happy that at least the matching is working now!

I have large amount of synonyms and keep adding new ones, some of them
partially overlap. Its the nature of a language that adding keywords to a
phrase creates distinctive meaning. Another example:


sales manager -> director of sales
regional sales manager -> area manager

I'd expect "regional sales manager" to be indexed as both.

regional sales manager
            ^^^^^^^^^^^^^^ -> director of sales
^^^^^^^^^^^^^^^^^^^^^^ -> area manager

so that searching for any of those terms matches and highlights relevant
part.
However when SynonymGraphFilter finds one synonym it will ignore the other.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Handling overlapping synonyms

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
From my understanding, if you want regional sales manager to be indexed as both director of sales and area manager, you would have to type:

Regional sales manager -> director of sales, area manager

I do not believe you can chain synonyms.

Re: bigrams/trigrams, I was more interested in you wanting to manually create them by inserting a "_" between the tokens. There is a bigram / trigram capability OOTB with Solr, so is there a reason you're manually coding these into your index instead of just using the OOTB function?

On 1/20/20, 6:58 AM, "fiedzia" <[hidden email]> wrote:

    > what is the reasoning behind adding the bigrams and trigrams manually like
    that? Maybe if we knew the end goal, we could figure out a different
    strategy. Happy that at least the matching is working now!
   
    I have large amount of synonyms and keep adding new ones, some of them
    partially overlap. Its the nature of a language that adding keywords to a
    phrase creates distinctive meaning. Another example:
   
   
    sales manager -> director of sales
    regional sales manager -> area manager
   
    I'd expect "regional sales manager" to be indexed as both.
   
    regional sales manager
                ^^^^^^^^^^^^^^ -> director of sales
    ^^^^^^^^^^^^^^^^^^^^^^ -> area manager
   
    so that searching for any of those terms matches and highlights relevant
    part.
    However when SynonymGraphFilter finds one synonym it will ignore the other.
   
   
   
    --
    Sent from: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=JUEk2QAGcPS4Pi_y6d3EWDmtYMVjg2Sg-4ZwC-90VqE&s=tgepeqV5fWmuUgtTc767hv_1czuJnhM9O9LmWVgpDdM&e= 
   

Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Handling overlapping synonyms

fiedzia
>From my understanding, if you want regional sales manager to be indexed as
both director of sales and area manager, you  
>would have to type:
>
>Regional sales manager -> director of sales, area manager

that works for searching, but because everything is in the same position,
searching for "director of sales" highlights whole "regional sales manager".

while it should be indexed as: (numbers inidicate token positions

1           2       3
regional sales manager

1
area manager
         2 director of sales


I guess I'll need to override SynonymGraphFilter to achieve that



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: Handling overlapping synonyms

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
Hm, I'm not sure what you mean, but I am pretty new to Solr. Apologies!

On 1/20/20, 12:01 PM, "fiedzia" <[hidden email]> wrote:

    >From my understanding, if you want regional sales manager to be indexed as
    both director of sales and area manager, you  
    >would have to type:
    >
    >Regional sales manager -> director of sales, area manager
   
    that works for searching, but because everything is in the same position,
    searching for "director of sales" highlights whole "regional sales manager".
   
    while it should be indexed as: (numbers inidicate token positions
   
    1           2       3
    regional sales manager
   
    1
    area manager
             2 director of sales
   
   
    I guess I'll need to override SynonymGraphFilter to achieve that
   
   
   
    --
    Sent from: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=tDOfGxVxBgFG1YZDv8WICuXs07jdb2IIpoJ0j3Fu7nc&s=yT0_rHgmEbHTvjxL9Vw9TN3d0TeqHg6avTkuseDWDw8&e=