Using Synonyms as a feature with LTR

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Synonyms as a feature with LTR

Roopa Rao
I am trying to use synonyms expansion as a feature in LTR

Any input on a feature using synonym expansion providing a field and the
synonym file would be helpful.

Thanks,
Roopa



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Using Synonyms as a feature with LTR

Alessandro Benedetti
In the end a feature will just be a numerical value.
How do you plan to use synonyms in a field to generate a numerical feature ?

Are you planning to define a binary feature for a field, in case there is a
match on the synonyms ?
Or a feature which contains a score for a query ( with synonyms expansion) ?

I would start from the SolrFeature, let's assume the "title" field has a
field type that includes synonyms ( query time) :

{
    "store" : "featureStore",
    "name" : "hasTitleMatch",
    "class" : "org.apache.solr.ltr.feature.SolrFeature",
    "params" : {
      "fq": [ "{!field f=title}${query}" ]
    }

Query time analysis will be applied and synonyms expanded.
So the feature will have a value , which is the score returned for the query
and the document ( under scoring) .
You can play with that and design the feature that best fit your idea.

Regards








-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Using Synonyms as a feature with LTR

Roopa Rao
Thank you, Alessandro,

I was trying these options before replying.

Yes, I am looking to generate a score for a query with synonym expansion
(not binary feature)

I can go with the "title" field and have that include the synonyms in
analysis. Only problem is that the number of fields and number of synonyms
files are quite a lot (~ 8 synonyms files) due to different weightage and
type of expansion (exact vs partial) based on these. Hence going with this
approach would mean creating more fields for all these synonyms
(synonyms.txt)

So, I am looking to build a custom parser for which I could supply the file
and the field and that would expand the synonyms and return a score.


Thanks,
Roopa




On Mon, Feb 12, 2018 at 6:23 AM, Alessandro Benedetti <[hidden email]>
wrote:

> In the end a feature will just be a numerical value.
> How do you plan to use synonyms in a field to generate a numerical feature
> ?
>
> Are you planning to define a binary feature for a field, in case there is a
> match on the synonyms ?
> Or a feature which contains a score for a query ( with synonyms expansion)
> ?
>
> I would start from the SolrFeature, let's assume the "title" field has a
> field type that includes synonyms ( query time) :
>
> {
>     "store" : "featureStore",
>     "name" : "hasTitleMatch",
>     "class" : "org.apache.solr.ltr.feature.SolrFeature",
>     "params" : {
>       "fq": [ "{!field f=title}${query}" ]
>     }
>
> Query time analysis will be applied and synonyms expanded.
> So the feature will have a value , which is the score returned for the
> query
> and the document ( under scoring) .
> You can play with that and design the feature that best fit your idea.
>
> Regards
>
>
>
>
>
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Using Synonyms as a feature with LTR

Alessandro Benedetti
"I can go with the "title" field and have that include the synonyms in
analysis. Only problem is that the number of fields and number of synonyms
files are quite a lot (~ 8 synonyms files) due to different weightage and
type of expansion (exact vs partial) based on these. Hence going with this
approach would mean creating more fields for all these synonyms
(synonyms.txt)

So, I am looking to build a custom parser for which I could supply the file
and the field and that would expand the synonyms and return a score. "

Having a binary or scalar feature is completely up to you and the way you
configure the Solr feature.
If you have 8 (copy?)fields with same content but different expansion, that
is still ok.
You can have 8 features, one per type of expansion.
LTR will take care of the weight to be assigned to those features.

"So, I am looking to build a custom parser for which I could supply the file
and the field and that would expand the synonyms and return a score. ""
I don't get this , can you elaborate ?

Regards



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Using Synonyms as a feature with LTR

Roopa Rao
So, I would end up with ~6 copy fields with ~8 synonym files so that would
be about 48 field/synonym combination. Would that be a significant in terms
of index size. What would be the best way to measure this?

Custom parser:
This would take the file name, field to run the analysis on. This field
need not be a copy field which holds data, since we can use this is only
for getting the analysis.
Get the synonyms for the user query as tokens.
Create a edismax query based on the query tokens.
Return the score

This custom parser would be called in LTR as a scalar feature.

I am at the stage I can get the synonyms from the analysis chain, however
tokens are individual tokens and not phrases. So, I am stuck at how to
construct a correct query based on the synonym tokens and positions.

Thank you,
Roopa

On Wed, Feb 14, 2018 at 10:12 AM, Roopa Rao <[hidden email]> wrote:

> So, I would end up with ~6 copy fields with ~8 synonym files so that would
> be about 48 field/synonym combination. Would that be a significant in terms
> of index size. I guess that depends on the thesaurus size, what would be
> the best way to measure this?
>
> Custom parser:
> This would take the file name, field to run the analysis on. This field
> need not be a copy field which holds data, since we can use this is only
> for getting the analysis.
> Get the synonyms for the user query as tokens.
> Create a edismax query based on the query tokens.
> Return the score
>
> This custom parser would be called in LTR as a scalar feature.
>
> I am at the stage I can get the synonyms from the analysis chain, however
> tokens are individual tokens and not phrases. So, I am stuck at how to
> construct a correct query based on the synonym tokens and positions.
>
> Thank you,
> Roopa
>
>
>
> On Wed, Feb 14, 2018 at 5:23 AM, Alessandro Benedetti <
> [hidden email]> wrote:
>
>> "I can go with the "title" field and have that include the synonyms in
>> analysis. Only problem is that the number of fields and number of synonyms
>> files are quite a lot (~ 8 synonyms files) due to different weightage and
>> type of expansion (exact vs partial) based on these. Hence going with this
>> approach would mean creating more fields for all these synonyms
>> (synonyms.txt)
>>
>> So, I am looking to build a custom parser for which I could supply the
>> file
>> and the field and that would expand the synonyms and return a score. "
>>
>> Having a binary or scalar feature is completely up to you and the way you
>> configure the Solr feature.
>> If you have 8 (copy?)fields with same content but different expansion,
>> that
>> is still ok.
>> You can have 8 features, one per type of expansion.
>> LTR will take care of the weight to be assigned to those features.
>>
>> "So, I am looking to build a custom parser for which I could supply the
>> file
>> and the field and that would expand the synonyms and return a score. ""
>> I don't get this , can you elaborate ?
>>
>> Regards
>>
>>
>>
>> -----
>> ---------------
>> Alessandro Benedetti
>> Search Consultant, R&D Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Using Synonyms as a feature with LTR

Alessandro Benedetti
I see,
According to what I know it is not possible to run for the same field
different query time analysis.

Not sure if anyone was working on that.

Regards



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Using Synonyms as a feature with LTR

Roopa Rao
I see okay, thank you.

On Wed, Feb 14, 2018 at 10:34 AM, Alessandro Benedetti <[hidden email]
> wrote:

> I see,
> According to what I know it is not possible to run for the same field
> different query time analysis.
>
> Not sure if anyone was working on that.
>
> Regards
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>