Regarding LTR feature

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Regarding LTR feature

prateek.agarwal
Hi all,

I'm new to solr ltr and stuck on this problem for a while.

I wanted to ask why the documents on which the ltr feature score is
calculated doesn't filter out the documents even if we provide the fq
filter in the url like:

&q=juice&rq={!ltr%20model=my_feature_model%20efi.query=$q%
20reRankDocs=300%20efi.store=1}&fq=parent_store_3630_i:%201

Here the score calculation for features should only use the documents
returned from these fq parameter but it's not really the case. Is it a bug
or something.

Thanks in advance.


Regards,
Prateek
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

Alessandro Benedetti
Hi Prateek,
with query and FQ Solr is expected to score a document only if that document
is a match of all the FQ results intersected with the query results [1].
Then re-ranking happens, so effectively, only the top K intersected
documents will be re-ranked.

If you are curious about the code, this can be debugged running a variation
of org.apache.solr.ltr.TestLTRWithFacet#testRankingSolrFacet (introducing
filter queries ) and setting the breakpoint somewhere around :
org/apache/solr/ltr/LTRRescorer.java:181

Can you elaborate how you have verified that is currently not working like
that ?
I am familiar with LTR code and I would be surprised to see this different
behavior

[1] https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

prateek.agarwal
In reply to this post by prateek.agarwal
Hi Alessandro,

Thanks for responding.

Let me take a step back and tell you the problem I have been facing with
this.So one of the features in my LTR model is:

{
  "store" : "my_feature_store",
  "name" : "in_aggregated_terms",
  "class" : "org.apache.solr.ltr.feature.SolrFeature",
  "params" : { "q" : "{!func}scale(query({!payload_score
f=aggregated_terms func=max v=${query}}),0,100)" }
}

so now with this feature if i apply FQ in solr it will scale the
values for all the documents irrespective of the FQ filter.

But if I change the feature to something like this:

{
  "store" : "my_feature_store",
  "name" : "in_aggregated_terms",
  "class" : "org.apache.solr.ltr.feature.SolrFeature",
  "params" : { "q" : "{!func}scale(query({!field f=aggregated_terms
v=${query}}),0,100)" }
}

Then the it scales properly with FQ aswell.

And about that verification I simply check the results returned like
in Case 1 after applying the FQ filter that feature score doesn't
scale to its maximum value of 100 which i think is because of the fact
that it scales over all the documents and returns only the subset with
the FQ filter applied.

Alternatively is their any way I can scale these value during
normalization time with a customized class which iterates over all the
re-ranked documents only.

Thanks a lot in advance.

Looking forward to hearing back from you soon.


Regards,

Prateek
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

prateek.agarwal
In reply to this post by prateek.agarwal
Hi Alessandro,

Thanks for responding.

Let me take a step back and tell you the problem I have been facing with
this.So one of the features in my LTR model is:

{
"store" : "my_feature_store",
"name" : "in_aggregated_terms",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "q" : "{!func}scale(query({!payload_score
f=aggregated_terms func=max v=${query}}),0,100)" }
}

so now with this feature if i apply FQ in solr it will scale the
values for all the documents irrespective of the FQ filter.

But if I change the feature to something like this:

{
"store" : "my_feature_store",
"name" : "in_aggregated_terms",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "q" : "{!func}scale(query({!field f=aggregated_terms
v=${query}}),0,100)" }
}

Then the it scales properly with FQ aswell.

And about that verification I simply check the results returned like
in Case 1 after applying the FQ filter that feature score doesn't
scale to its maximum value of 100 which i think is because of the fact
that it scales over all the documents and returns only the subset with
the FQ filter applied.

Alternatively is their any way I can scale these value during
normalization time with a customized class which iterates over all the
re-ranked documents only.

Thanks a lot in advance.

Looking forward to hearing back from you soon.


Regards,

Prateek

On 2018/04/30 11:58:44, Alessandro Benedetti <[hidden email]> wrote: > Hi
Prateek,> > with query and FQ Solr is expected to score a document only
if that document> > is a match of all the FQ results intersected with
the query results [1].> > Then re-ranking happens, so effectively, only
the top K intersected> > documents will be re-ranked.> > > If you are
curious about the code, this can be debugged running a variation> > of
org.apache.solr.ltr.TestLTRWithFacet#testRankingSolrFacet (introducing>
 > filter queries ) and setting the breakpoint somewhere around :> >
org/apache/solr/ltr/LTRRescorer.java:181> > > Can you elaborate how you
have verified that is currently not working like> > that ?> > I am
familiar with LTR code and I would be surprised to see this different> >
behavior> > > [1]
https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/>
 > > > > -----> > ---------------> > Alessandro Benedetti> > Search
Consultant, R&D Software Engineer, Director> > Sease Ltd. -
www.sease.io> > --> > Sent from:
http://lucene.472066.n3.nabble.com/Solr-User-f472068.html> >
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

prateek.agarwal
In reply to this post by prateek.agarwal
Hi Alessandro,

Thanks for responding.

Let me take a step back and tell you the problem I have been facing with
this.So one of the features in my LTR model is:

{
"store" : "my_feature_store",
"name" : "in_aggregated_terms",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "q" : "{!func}scale(query({!payload_score
f=aggregated_terms func=max v=${query}}),0,100)" }
}

so now with this feature if i apply FQ in solr it will scale the
values for all the documents irrespective of the FQ filter.

But if I change the feature to something like this:

{
"store" : "my_feature_store",
"name" : "in_aggregated_terms",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "q" : "{!func}scale(query({!field f=aggregated_terms
v=${query}}),0,100)" }
}

Then the it scales properly with FQ aswell.

And about that verification I simply check the results returned like
in Case 1 after applying the FQ filter that feature score doesn't
scale to its maximum value of 100 which i think is because of the fact
that it scales over all the documents and returns only the subset with
the FQ filter applied.

Alternatively is their any way I can scale these value during
normalization time with a customized class which iterates over all the
re-ranked documents only.

Thanks a lot in advance.

Looking forward to hearing back from you soon.


Regards,

Prateek

On 2018/04/30 11:58:44, Alessandro Benedetti <[hidden email]> wrote: > Hi
Prateek,> > with query and FQ Solr is expected to score a document only
if that document> > is a match of all the FQ results intersected with
the query results [1].> > Then re-ranking happens, so effectively, only
the top K intersected> > documents will be re-ranked.> > > If you are
curious about the code, this can be debugged running a variation> > of
org.apache.solr.ltr.TestLTRWithFacet#testRankingSolrFacet (introducing>
 > filter queries ) and setting the breakpoint somewhere around :> >
org/apache/solr/ltr/LTRRescorer.java:181> > > Can you elaborate how you
have verified that is currently not working like> > that ?> > I am
familiar with LTR code and I would be surprised to see this different> >
behavior> > > [1]
https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/>
 > > > > -----> > ---------------> > Alessandro Benedetti> > Search
Consultant, R&D Software Engineer, Director> > Sease Ltd. -
www.sease.io> > --> > Sent from:
http://lucene.472066.n3.nabble.com/Solr-User-f472068.html> >
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

prateek.agarwal
In reply to this post by Alessandro Benedetti
Hi Alessandro,

Thanks for responding.

Let me take a step back and tell you the problem I have been facing with
this.So one of the features in my LTR model is:

{
"store" : "my_feature_store",
"name" : "in_aggregated_terms",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "q" : "{!func}scale(query({!payload_score
f=aggregated_terms func=max v=${query}}),0,100)" }
}

so now with this feature if i apply FQ in solr it will scale the
values for all the documents irrespective of the FQ filter.

But if I change the feature to something like this:

{
"store" : "my_feature_store",
"name" : "in_aggregated_terms",
"class" : "org.apache.solr.ltr.feature.SolrFeature",
"params" : { "q" : "{!func}scale(query({!field f=aggregated_terms
v=${query}}),0,100)" }
}

Then the it scales properly with FQ aswell.

And about that verification I simply check the results returned like
in Case 1 after applying the FQ filter that feature score doesn't
scale to its maximum value of 100 which i think is because of the fact
that it scales over all the documents and returns only the subset with
the FQ filter applied.

Alternatively is their any way I can scale these value during
normalization time with a customized class which iterates over all the
re-ranked documents only.

Thanks a lot in advance.

Looking forward to hearing back from you soon.


Regards,

Prateek
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

Alessandro Benedetti
In reply to this post by prateek.agarwal
Mmmm, first of all, you know that each Solr feature is calculated per
document right ?
So you want to calculate the payload score for the document you are
re-ranking, based on the query ( your External Feature Information) and
normalize across the different documents?

I would go with this feature and use the normalization LTR functionality :

{
  "store" : "my_feature_store",
  "name" : "in_aggregated_terms",
  "class" : "org.apache.solr.ltr.feature.SolrFeature",
  "params" : { "q" : "{!payload_score
f=aggregated_terms func=max v=${query}}" }
}

Then in the model you specify something like :

"name" : "myModelName",
   "features" : [
       {
         "name" : "isBook"
       },
...
       {
         "name" : "in_aggregated_terms",
         "norm": {
             "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer",
             "params" : { "min":"x", "max":"y" }
         }
       },
       }

Give it a try, let me know




-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

prateek.agarwal
Thanks again Alessandro

I tried with the feature and the Minmax normalizer you told.But then there is a slight problem with the params in normalization. I don't really know the range(Min, Max) of values the payload_score outputs and they are different for different queries.

I even tried looking at the source code to see if there is a way I can override a class so that it iterates over all the re-ranked documents and calculate Max and min there itself and pass it to MinMax normalizer class but it seems it's not possible.

Your help will be really appreciated.

Thanks



Regards,
Prateek







Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

prateek.agarwal
In reply to this post by Alessandro Benedetti
Thanks again Alessandro

I tried with the feature and the Minmax normalizer you told.But then there is a slight problem with the params in normalization. I don't really know the range(Min, Max) of values the payload_score outputs and they are different for different queries.

I even tried looking at the source code to see if there is a way I can override a class so that it iterates over all the re-ranked documents and calculate Max and min there itself and pass it to MinMax normalizer class but it seems it's not possible.

Your help will really appreciated.

Thanks



Regards,
Prateek



On 2018/05/03 14:00:00, Alessandro Benedetti <[hidden email]> wrote:

> Mmmm, first of all, you know that each Solr feature is calculated per
> document right ?
> So you want to calculate the payload score for the document you are
> re-ranking, based on the query ( your External Feature Information) and
> normalize across the different documents?
>
> I would go with this feature and use the normalization LTR functionality :
>
> {
>   "store" : "my_feature_store",
>   "name" : "in_aggregated_terms",
>   "class" : "org.apache.solr.ltr.feature.SolrFeature",
>   "params" : { "q" : "{!payload_score
> f=aggregated_terms func=max v=${query}}" }
> }
>
> Then in the model you specify something like :
>
> "name" : "myModelName",
>    "features" : [
>        {
>          "name" : "isBook"
>        },
> ...
>        {
>          "name" : "in_aggregated_terms",
>          "norm": {
>              "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer",
>              "params" : { "min":"x", "max":"y" }
>          }
>        },
>        }
>
> Give it a try, let me know
>
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

Alessandro Benedetti
In reply to this post by prateek.agarwal
Hi Preteek,
I would assume you have that feature at training time as well, can't you use
the training set to estabilish the parameters for the normalizer at query
time ?

In the end being a normalization, doesn't have to be that accurate to the
query time state, but it must reflect the relations the model learnt from
the training set.
Let me know !



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

prateek.agarwal
Hi Alessandro,

You're right it doesn't have to be that accurate to the query time but our requirement is having a more solid control over our outputs from Solr like if we have 4 features then we can adjust the weights giving something like (40,20,20,20) to each feature such that the sum total of features for a document is 100 this is only possible if we could scale the feature outputs between 0-1.

Secondly, I also have a doubt regarding the scaling function like why it is not considering only the documents filtered out by the FQ filter and considering all the documents which match the query.

Thaks a lot in advance.
Looking forward to hearing back from you soon.


Regards,
Prateek

On 2018/05/04 10:26:55, Alessandro Benedetti <[hidden email]> wrote:

> Hi Preteek,
> I would assume you have that feature at training time as well, can't you use
> the training set to estabilish the parameters for the normalizer at query
> time ?
>
> In the end being a normalization, doesn't have to be that accurate to the
> query time state, but it must reflect the relations the model learnt from
> the training set.
> Let me know !
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

Alessandro Benedetti
So Prateek :

"You're right it doesn't have to be that accurate to the query time but our
requirement is having a more solid control over our outputs from Solr like
if we have 4 features then we can adjust the weights giving something like
(40,20,20,20) to each feature such that the sum total of features for a
document is 100 this is only possible if we could scale the feature outputs
between 0-1."
You are talking about weights so I assume you are using a linear Learning To
Rank model.
Which library are you using to train your model?
Is this library allowing you to limit the summation of the linear weights
and normalise the training set per feature ?

At query time LTR will just apply the model weights to the query time
feature vector.
It makes sense to normalise each query time feature using the training time
values.
They should be close enough to the training set values ( if not the model is
going to perform poor anyway and you need to curate a little bit better the
training phase).
Remember the model is used to give an order to the results, not to make an
accurate regression prediction.


"Secondly, I also have a doubt regarding the scaling function like why it is
not considering only the documents filtered out by the FQ filter and
considering all the documents which match the query."

At the moment I would not focus on that scenario, I am not very convinced
LTR SolrFeature is compatible to that complex function query, and I am not
very convinced is going to be performance friendly anyway.
i would need to investigate that properly.

Regards



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

prateek.agarwal
Hi Alessandro,

"You are talking about weights so I assume you are using a linear Learning
To
Rank model.
Which library are you using to train your model?
Is this library allowing you to limit the summation of the linear weights
and normalise the training set per feature? "

Yes, we're planning to use Linear LTR model and we're not using any library
for this. Basically, we currently have an idea like what feature is more
important to us than others so we will be adjusting the weights accordingly
like (40,20,20,20) here the first feature is most important to us currently.

"At the moment I would not focus on that scenario, I am not very convinced
LTR SolrFeature is compatible with that complex function query, and I am not
very convinced is going to be performance friendly anyway.
I would need to investigate that properly. "

You're completely right it is not going to be performance friendly anyway
but I'm pretty sure that payload_score calculation considers all the
documents which match the query not the small subset of documents which
resulted from FQ_filters. I have even verified it using the debug=True like:
The result found after applying that FQ_filter were 365 but below in the
debugging part the docfreq used in the payload_score calculation was
3360(which is the total no. of documents that match the query). It was hard
for me to get my head around that code maybe you can help. And it was worth
a shot I think.

Thanks Again.


Regards,
Prateek



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Regarding LTR feature

Alessandro Benedetti
"FQ_filter were 365 but below in the
debugging part the docfreq used in the payload_score calculation was
3360"

If you are talking about the doc frequency of a term, obviously this is
corpus based ( necessary for the TF /IDF calculations) so it wil not be
affected by the filter queries.
The payload score part may be different.

Anyway, you mentioned that you assign the weights, in that case the learning
to rank plugin may be not necessary at all.

Regards




-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io