Configurable collectors for custom ranking

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Configurable collectors for custom ranking

Peter Keegan
I looked at SOLR-4465 and SOLR-5045, where it appears that there is a goal
to be able to do custom sorting and ranking in a PostFilter. So far, it
looks like only custom aggregation can be implemented in PostFilter (5045).
Custom sorting/ranking can be done in a pluggable collector (4465), but
this patch is no longer in dev.

Is there any other dev. being done on adding custom sorting (after
collection) via a plugin?

Thanks,
Peter
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Joel Bernstein
Hi Peter,

I've been meaning to revisit configurable ranking collectors, but I haven't
yet had a chance. It's on the shortlist of things I'd like to tackle
though.



On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <[hidden email]> wrote:

> I looked at SOLR-4465 and SOLR-5045, where it appears that there is a goal
> to be able to do custom sorting and ranking in a PostFilter. So far, it
> looks like only custom aggregation can be implemented in PostFilter (5045).
> Custom sorting/ranking can be done in a pluggable collector (4465), but
> this patch is no longer in dev.
>
> Is there any other dev. being done on adding custom sorting (after
> collection) via a plugin?
>
> Thanks,
> Peter
>



--
Joel Bernstein
Search Engineer at Heliosearch
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Peter Keegan
Hi Joel,

This is related to another thread on function query matching (
http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513).
The patch in SOLR-4465 will allow me to extend TopDocsCollector and perform
the 'scale' function on only the documents matching the main dismax query.
As you mention, it is a slightly intrusive design and requires that I
manage my own PriorityQueue (and a local duplicate of HitQueue), but should
work. I think a better design would hide the PQ from the plugin.

Thanks,
Peter


On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <[hidden email]> wrote:

> Hi Peter,
>
> I've been meaning to revisit configurable ranking collectors, but I haven't
> yet had a chance. It's on the shortlist of things I'd like to tackle
> though.
>
>
>
> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <[hidden email]>
> wrote:
>
> > I looked at SOLR-4465 and SOLR-5045, where it appears that there is a
> goal
> > to be able to do custom sorting and ranking in a PostFilter. So far, it
> > looks like only custom aggregation can be implemented in PostFilter
> (5045).
> > Custom sorting/ranking can be done in a pluggable collector (4465), but
> > this patch is no longer in dev.
> >
> > Is there any other dev. being done on adding custom sorting (after
> > collection) via a plugin?
> >
> > Thanks,
> > Peter
> >
>
>
>
> --
> Joel Bernstein
> Search Engineer at Heliosearch
>
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Peter Keegan
Quick question:
In the context of a custom collector, how does one get the values of a
field of type 'ExternalFileField'?

Thanks,
Peter


On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <[hidden email]>wrote:

> Hi Joel,
>
> This is related to another thread on function query matching (
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513).
> The patch in SOLR-4465 will allow me to extend TopDocsCollector and perform
> the 'scale' function on only the documents matching the main dismax query.
> As you mention, it is a slightly intrusive design and requires that I
> manage my own PriorityQueue (and a local duplicate of HitQueue), but should
> work. I think a better design would hide the PQ from the plugin.
>
> Thanks,
> Peter
>
>
> On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <[hidden email]> wrote:
>
>> Hi Peter,
>>
>> I've been meaning to revisit configurable ranking collectors, but I
>> haven't
>> yet had a chance. It's on the shortlist of things I'd like to tackle
>> though.
>>
>>
>>
>> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <[hidden email]>
>> wrote:
>>
>> > I looked at SOLR-4465 and SOLR-5045, where it appears that there is a
>> goal
>> > to be able to do custom sorting and ranking in a PostFilter. So far, it
>> > looks like only custom aggregation can be implemented in PostFilter
>> (5045).
>> > Custom sorting/ranking can be done in a pluggable collector (4465), but
>> > this patch is no longer in dev.
>> >
>> > Is there any other dev. being done on adding custom sorting (after
>> > collection) via a plugin?
>> >
>> > Thanks,
>> > Peter
>> >
>>
>>
>>
>> --
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Joel Bernstein
Peter,

It sounds like you could achieve what you want to do in a PostFilter rather
then extending the TopDocsCollector. Is there a reason why a PostFilter
won't work for you?

Joel


On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <[hidden email]>wrote:

> Quick question:
> In the context of a custom collector, how does one get the values of a
> field of type 'ExternalFileField'?
>
> Thanks,
> Peter
>
>
> On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <[hidden email]
> >wrote:
>
> > Hi Joel,
> >
> > This is related to another thread on function query matching (
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> ).
> > The patch in SOLR-4465 will allow me to extend TopDocsCollector and
> perform
> > the 'scale' function on only the documents matching the main dismax
> query.
> > As you mention, it is a slightly intrusive design and requires that I
> > manage my own PriorityQueue (and a local duplicate of HitQueue), but
> should
> > work. I think a better design would hide the PQ from the plugin.
> >
> > Thanks,
> > Peter
> >
> >
> > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <[hidden email]>
> wrote:
> >
> >> Hi Peter,
> >>
> >> I've been meaning to revisit configurable ranking collectors, but I
> >> haven't
> >> yet had a chance. It's on the shortlist of things I'd like to tackle
> >> though.
> >>
> >>
> >>
> >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <[hidden email]>
> >> wrote:
> >>
> >> > I looked at SOLR-4465 and SOLR-5045, where it appears that there is a
> >> goal
> >> > to be able to do custom sorting and ranking in a PostFilter. So far,
> it
> >> > looks like only custom aggregation can be implemented in PostFilter
> >> (5045).
> >> > Custom sorting/ranking can be done in a pluggable collector (4465),
> but
> >> > this patch is no longer in dev.
> >> >
> >> > Is there any other dev. being done on adding custom sorting (after
> >> > collection) via a plugin?
> >> >
> >> > Thanks,
> >> > Peter
> >> >
> >>
> >>
> >>
> >> --
> >> Joel Bernstein
> >> Search Engineer at Heliosearch
> >>
> >
> >
>



--
Joel Bernstein
Search Engineer at Heliosearch
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Peter Keegan
Hi Joel,

I thought about using a PostFilter, but the problem is that the 'scale'
function must be done after all matching docs have been scored but before
adding them to the PriorityQueue that sorts just the rows to be returned.
Doing the 'scale' function wrapped in a 'query' is proving to be too slow
when it visits every document in the index.

In the Collector, I can see how to get the field values like this:
indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
QParser).getValues()

But, 'getValueSource' needs a QParser, which isn't available.
And I can't create a QParser without a SolrQueryRequest, which isn't
available.

Thanks,
Peter


On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <[hidden email]> wrote:

> Peter,
>
> It sounds like you could achieve what you want to do in a PostFilter rather
> then extending the TopDocsCollector. Is there a reason why a PostFilter
> won't work for you?
>
> Joel
>
>
> On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <[hidden email]
> >wrote:
>
> > Quick question:
> > In the context of a custom collector, how does one get the values of a
> > field of type 'ExternalFileField'?
> >
> > Thanks,
> > Peter
> >
> >
> > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <[hidden email]
> > >wrote:
> >
> > > Hi Joel,
> > >
> > > This is related to another thread on function query matching (
> > >
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > ).
> > > The patch in SOLR-4465 will allow me to extend TopDocsCollector and
> > perform
> > > the 'scale' function on only the documents matching the main dismax
> > query.
> > > As you mention, it is a slightly intrusive design and requires that I
> > > manage my own PriorityQueue (and a local duplicate of HitQueue), but
> > should
> > > work. I think a better design would hide the PQ from the plugin.
> > >
> > > Thanks,
> > > Peter
> > >
> > >
> > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <[hidden email]>
> > wrote:
> > >
> > >> Hi Peter,
> > >>
> > >> I've been meaning to revisit configurable ranking collectors, but I
> > >> haven't
> > >> yet had a chance. It's on the shortlist of things I'd like to tackle
> > >> though.
> > >>
> > >>
> > >>
> > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <[hidden email]>
> > >> wrote:
> > >>
> > >> > I looked at SOLR-4465 and SOLR-5045, where it appears that there is
> a
> > >> goal
> > >> > to be able to do custom sorting and ranking in a PostFilter. So far,
> > it
> > >> > looks like only custom aggregation can be implemented in PostFilter
> > >> (5045).
> > >> > Custom sorting/ranking can be done in a pluggable collector (4465),
> > but
> > >> > this patch is no longer in dev.
> > >> >
> > >> > Is there any other dev. being done on adding custom sorting (after
> > >> > collection) via a plugin?
> > >> >
> > >> > Thanks,
> > >> > Peter
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Joel Bernstein
> > >> Search Engineer at Heliosearch
> > >>
> > >
> > >
> >
>
>
>
> --
> Joel Bernstein
> Search Engineer at Heliosearch
>
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Peter Keegan
From the Collector context, I suppose I can access the FileFloatSource
directly like this, although it's not generic:

SchemaField field = indexSearcher.getSchema().getField(fieldName);
dataDir = indexSearcher.getSchema().getResourceLoader().getDataDir();
ExternalFileField eff = (ExternalFileField)field.getType();
fieldValues = eff.getFileFloatSource(field, dataDir);

And then read the values in 'setNextReader'

Peter


On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <[hidden email]>wrote:

> Hi Joel,
>
> I thought about using a PostFilter, but the problem is that the 'scale'
> function must be done after all matching docs have been scored but before
> adding them to the PriorityQueue that sorts just the rows to be returned.
> Doing the 'scale' function wrapped in a 'query' is proving to be too slow
> when it visits every document in the index.
>
> In the Collector, I can see how to get the field values like this:
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> QParser).getValues()
>
> But, 'getValueSource' needs a QParser, which isn't available.
> And I can't create a QParser without a SolrQueryRequest, which isn't
> available.
>
> Thanks,
> Peter
>
>
> On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <[hidden email]>wrote:
>
>> Peter,
>>
>> It sounds like you could achieve what you want to do in a PostFilter
>> rather
>> then extending the TopDocsCollector. Is there a reason why a PostFilter
>> won't work for you?
>>
>> Joel
>>
>>
>> On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <[hidden email]
>> >wrote:
>>
>> > Quick question:
>> > In the context of a custom collector, how does one get the values of a
>> > field of type 'ExternalFileField'?
>> >
>> > Thanks,
>> > Peter
>> >
>> >
>> > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <[hidden email]
>> > >wrote:
>> >
>> > > Hi Joel,
>> > >
>> > > This is related to another thread on function query matching (
>> > >
>> >
>> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
>> > ).
>> > > The patch in SOLR-4465 will allow me to extend TopDocsCollector and
>> > perform
>> > > the 'scale' function on only the documents matching the main dismax
>> > query.
>> > > As you mention, it is a slightly intrusive design and requires that I
>> > > manage my own PriorityQueue (and a local duplicate of HitQueue), but
>> > should
>> > > work. I think a better design would hide the PQ from the plugin.
>> > >
>> > > Thanks,
>> > > Peter
>> > >
>> > >
>> > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <[hidden email]>
>> > wrote:
>> > >
>> > >> Hi Peter,
>> > >>
>> > >> I've been meaning to revisit configurable ranking collectors, but I
>> > >> haven't
>> > >> yet had a chance. It's on the shortlist of things I'd like to tackle
>> > >> though.
>> > >>
>> > >>
>> > >>
>> > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <[hidden email]
>> >
>> > >> wrote:
>> > >>
>> > >> > I looked at SOLR-4465 and SOLR-5045, where it appears that there
>> is a
>> > >> goal
>> > >> > to be able to do custom sorting and ranking in a PostFilter. So
>> far,
>> > it
>> > >> > looks like only custom aggregation can be implemented in PostFilter
>> > >> (5045).
>> > >> > Custom sorting/ranking can be done in a pluggable collector (4465),
>> > but
>> > >> > this patch is no longer in dev.
>> > >> >
>> > >> > Is there any other dev. being done on adding custom sorting (after
>> > >> > collection) via a plugin?
>> > >> >
>> > >> > Thanks,
>> > >> > Peter
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Joel Bernstein
>> > >> Search Engineer at Heliosearch
>> > >>
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Joel Bernstein
In reply to this post by Peter Keegan
Here is one approach to use in a postfilter

1) In the collect() method call score for each doc. Use the scores to
create your scaleInfo.
2) Keep a bitset of the hits and a priorityQueue of your top X ScoreDocs.
3) Don't delegate any documents to lower collectors in the collect() method.
4) In the finish method create a score mapping (use the hppc
IntFloatOpenHashMap) with your top X docIds pointing to their score, using
the priorityQueue created in step 2. Then iterate the bitset (also created
in step 2) sending down each doc to the lower collectors, retrieving and
scaling the score from the score map. If the document is not in the score
map then send down 0.

You'll have setup a dummy scorer to feed to lower collectors. The
CollapsingQParserPlugin has an example of how to do this.




On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <[hidden email]>wrote:

> Hi Joel,
>
> I thought about using a PostFilter, but the problem is that the 'scale'
> function must be done after all matching docs have been scored but before
> adding them to the PriorityQueue that sorts just the rows to be returned.
> Doing the 'scale' function wrapped in a 'query' is proving to be too slow
> when it visits every document in the index.
>
> In the Collector, I can see how to get the field values like this:
>
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> QParser).getValues()
>
> But, 'getValueSource' needs a QParser, which isn't available.
> And I can't create a QParser without a SolrQueryRequest, which isn't
> available.
>
> Thanks,
> Peter
>
>
> On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <[hidden email]>
> wrote:
>
> > Peter,
> >
> > It sounds like you could achieve what you want to do in a PostFilter
> rather
> > then extending the TopDocsCollector. Is there a reason why a PostFilter
> > won't work for you?
> >
> > Joel
> >
> >
> > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <[hidden email]
> > >wrote:
> >
> > > Quick question:
> > > In the context of a custom collector, how does one get the values of a
> > > field of type 'ExternalFileField'?
> > >
> > > Thanks,
> > > Peter
> > >
> > >
> > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <[hidden email]
> > > >wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > This is related to another thread on function query matching (
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > > ).
> > > > The patch in SOLR-4465 will allow me to extend TopDocsCollector and
> > > perform
> > > > the 'scale' function on only the documents matching the main dismax
> > > query.
> > > > As you mention, it is a slightly intrusive design and requires that I
> > > > manage my own PriorityQueue (and a local duplicate of HitQueue), but
> > > should
> > > > work. I think a better design would hide the PQ from the plugin.
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > >
> > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <[hidden email]>
> > > wrote:
> > > >
> > > >> Hi Peter,
> > > >>
> > > >> I've been meaning to revisit configurable ranking collectors, but I
> > > >> haven't
> > > >> yet had a chance. It's on the shortlist of things I'd like to tackle
> > > >> though.
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> [hidden email]>
> > > >> wrote:
> > > >>
> > > >> > I looked at SOLR-4465 and SOLR-5045, where it appears that there
> is
> > a
> > > >> goal
> > > >> > to be able to do custom sorting and ranking in a PostFilter. So
> far,
> > > it
> > > >> > looks like only custom aggregation can be implemented in
> PostFilter
> > > >> (5045).
> > > >> > Custom sorting/ranking can be done in a pluggable collector
> (4465),
> > > but
> > > >> > this patch is no longer in dev.
> > > >> >
> > > >> > Is there any other dev. being done on adding custom sorting (after
> > > >> > collection) via a plugin?
> > > >> >
> > > >> > Thanks,
> > > >> > Peter
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Joel Bernstein
> > > >> Search Engineer at Heliosearch
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
>



--
Joel Bernstein
Search Engineer at Heliosearch
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Peter Keegan
This is what I was looking for, but the DelegatingCollector 'finish' method
doesn't exist in 4.3.0 :(   Can this be patched in and are there any other
PostFilter dependencies on 4.5?

Thanks,
Peter


On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <[hidden email]> wrote:

> Here is one approach to use in a postfilter
>
> 1) In the collect() method call score for each doc. Use the scores to
> create your scaleInfo.
> 2) Keep a bitset of the hits and a priorityQueue of your top X ScoreDocs.
> 3) Don't delegate any documents to lower collectors in the collect()
> method.
> 4) In the finish method create a score mapping (use the hppc
> IntFloatOpenHashMap) with your top X docIds pointing to their score, using
> the priorityQueue created in step 2. Then iterate the bitset (also created
> in step 2) sending down each doc to the lower collectors, retrieving and
> scaling the score from the score map. If the document is not in the score
> map then send down 0.
>
> You'll have setup a dummy scorer to feed to lower collectors. The
> CollapsingQParserPlugin has an example of how to do this.
>
>
>
>
> On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <[hidden email]
> >wrote:
>
> > Hi Joel,
> >
> > I thought about using a PostFilter, but the problem is that the 'scale'
> > function must be done after all matching docs have been scored but before
> > adding them to the PriorityQueue that sorts just the rows to be returned.
> > Doing the 'scale' function wrapped in a 'query' is proving to be too slow
> > when it visits every document in the index.
> >
> > In the Collector, I can see how to get the field values like this:
> >
> >
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> > QParser).getValues()
> >
> > But, 'getValueSource' needs a QParser, which isn't available.
> > And I can't create a QParser without a SolrQueryRequest, which isn't
> > available.
> >
> > Thanks,
> > Peter
> >
> >
> > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <[hidden email]>
> > wrote:
> >
> > > Peter,
> > >
> > > It sounds like you could achieve what you want to do in a PostFilter
> > rather
> > > then extending the TopDocsCollector. Is there a reason why a PostFilter
> > > won't work for you?
> > >
> > > Joel
> > >
> > >
> > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <[hidden email]
> > > >wrote:
> > >
> > > > Quick question:
> > > > In the context of a custom collector, how does one get the values of
> a
> > > > field of type 'ExternalFileField'?
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > >
> > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> [hidden email]
> > > > >wrote:
> > > >
> > > > > Hi Joel,
> > > > >
> > > > > This is related to another thread on function query matching (
> > > > >
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > > > ).
> > > > > The patch in SOLR-4465 will allow me to extend TopDocsCollector and
> > > > perform
> > > > > the 'scale' function on only the documents matching the main dismax
> > > > query.
> > > > > As you mention, it is a slightly intrusive design and requires
> that I
> > > > > manage my own PriorityQueue (and a local duplicate of HitQueue),
> but
> > > > should
> > > > > work. I think a better design would hide the PQ from the plugin.
> > > > >
> > > > > Thanks,
> > > > > Peter
> > > > >
> > > > >
> > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <[hidden email]
> >
> > > > wrote:
> > > > >
> > > > >> Hi Peter,
> > > > >>
> > > > >> I've been meaning to revisit configurable ranking collectors, but
> I
> > > > >> haven't
> > > > >> yet had a chance. It's on the shortlist of things I'd like to
> tackle
> > > > >> though.
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> > [hidden email]>
> > > > >> wrote:
> > > > >>
> > > > >> > I looked at SOLR-4465 and SOLR-5045, where it appears that there
> > is
> > > a
> > > > >> goal
> > > > >> > to be able to do custom sorting and ranking in a PostFilter. So
> > far,
> > > > it
> > > > >> > looks like only custom aggregation can be implemented in
> > PostFilter
> > > > >> (5045).
> > > > >> > Custom sorting/ranking can be done in a pluggable collector
> > (4465),
> > > > but
> > > > >> > this patch is no longer in dev.
> > > > >> >
> > > > >> > Is there any other dev. being done on adding custom sorting
> (after
> > > > >> > collection) via a plugin?
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Peter
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Joel Bernstein
> > > > >> Search Engineer at Heliosearch
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Joel Bernstein
> > > Search Engineer at Heliosearch
> > >
> >
>
>
>
> --
> Joel Bernstein
> Search Engineer at Heliosearch
>
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Joel Bernstein
SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher I
believe. They might apply to 4.3.
I think as long you have the finish method that's all you'll need. If you
can get this working it would be excellent if you could donate back the
Scale PostFilter.


On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <[hidden email]>wrote:

> This is what I was looking for, but the DelegatingCollector 'finish' method
> doesn't exist in 4.3.0 :(   Can this be patched in and are there any other
> PostFilter dependencies on 4.5?
>
> Thanks,
> Peter
>
>
> On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <[hidden email]>
> wrote:
>
> > Here is one approach to use in a postfilter
> >
> > 1) In the collect() method call score for each doc. Use the scores to
> > create your scaleInfo.
> > 2) Keep a bitset of the hits and a priorityQueue of your top X ScoreDocs.
> > 3) Don't delegate any documents to lower collectors in the collect()
> > method.
> > 4) In the finish method create a score mapping (use the hppc
> > IntFloatOpenHashMap) with your top X docIds pointing to their score,
> using
> > the priorityQueue created in step 2. Then iterate the bitset (also
> created
> > in step 2) sending down each doc to the lower collectors, retrieving and
> > scaling the score from the score map. If the document is not in the score
> > map then send down 0.
> >
> > You'll have setup a dummy scorer to feed to lower collectors. The
> > CollapsingQParserPlugin has an example of how to do this.
> >
> >
> >
> >
> > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <[hidden email]
> > >wrote:
> >
> > > Hi Joel,
> > >
> > > I thought about using a PostFilter, but the problem is that the 'scale'
> > > function must be done after all matching docs have been scored but
> before
> > > adding them to the PriorityQueue that sorts just the rows to be
> returned.
> > > Doing the 'scale' function wrapped in a 'query' is proving to be too
> slow
> > > when it visits every document in the index.
> > >
> > > In the Collector, I can see how to get the field values like this:
> > >
> > >
> >
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> > > QParser).getValues()
> > >
> > > But, 'getValueSource' needs a QParser, which isn't available.
> > > And I can't create a QParser without a SolrQueryRequest, which isn't
> > > available.
> > >
> > > Thanks,
> > > Peter
> > >
> > >
> > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <[hidden email]>
> > > wrote:
> > >
> > > > Peter,
> > > >
> > > > It sounds like you could achieve what you want to do in a PostFilter
> > > rather
> > > > then extending the TopDocsCollector. Is there a reason why a
> PostFilter
> > > > won't work for you?
> > > >
> > > > Joel
> > > >
> > > >
> > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
> [hidden email]
> > > > >wrote:
> > > >
> > > > > Quick question:
> > > > > In the context of a custom collector, how does one get the values
> of
> > a
> > > > > field of type 'ExternalFileField'?
> > > > >
> > > > > Thanks,
> > > > > Peter
> > > > >
> > > > >
> > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> > [hidden email]
> > > > > >wrote:
> > > > >
> > > > > > Hi Joel,
> > > > > >
> > > > > > This is related to another thread on function query matching (
> > > > > >
> > > > >
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > > > > ).
> > > > > > The patch in SOLR-4465 will allow me to extend TopDocsCollector
> and
> > > > > perform
> > > > > > the 'scale' function on only the documents matching the main
> dismax
> > > > > query.
> > > > > > As you mention, it is a slightly intrusive design and requires
> > that I
> > > > > > manage my own PriorityQueue (and a local duplicate of HitQueue),
> > but
> > > > > should
> > > > > > work. I think a better design would hide the PQ from the plugin.
> > > > > >
> > > > > > Thanks,
> > > > > > Peter
> > > > > >
> > > > > >
> > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
> [hidden email]
> > >
> > > > > wrote:
> > > > > >
> > > > > >> Hi Peter,
> > > > > >>
> > > > > >> I've been meaning to revisit configurable ranking collectors,
> but
> > I
> > > > > >> haven't
> > > > > >> yet had a chance. It's on the shortlist of things I'd like to
> > tackle
> > > > > >> though.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> > > [hidden email]>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it appears that
> there
> > > is
> > > > a
> > > > > >> goal
> > > > > >> > to be able to do custom sorting and ranking in a PostFilter.
> So
> > > far,
> > > > > it
> > > > > >> > looks like only custom aggregation can be implemented in
> > > PostFilter
> > > > > >> (5045).
> > > > > >> > Custom sorting/ranking can be done in a pluggable collector
> > > (4465),
> > > > > but
> > > > > >> > this patch is no longer in dev.
> > > > > >> >
> > > > > >> > Is there any other dev. being done on adding custom sorting
> > (after
> > > > > >> > collection) via a plugin?
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Peter
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Joel Bernstein
> > > > > >> Search Engineer at Heliosearch
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Joel Bernstein
> > > > Search Engineer at Heliosearch
> > > >
> > >
> >
> >
> >
> > --
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
>



--
Joel Bernstein
Search Engineer at Heliosearch
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Peter Keegan
Thanks very much for the guidance. I'd be happy to donate a working
solution.

Peter


On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <[hidden email]> wrote:

> SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher I
> believe. They might apply to 4.3.
> I think as long you have the finish method that's all you'll need. If you
> can get this working it would be excellent if you could donate back the
> Scale PostFilter.
>
>
> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <[hidden email]
> >wrote:
>
> > This is what I was looking for, but the DelegatingCollector 'finish'
> method
> > doesn't exist in 4.3.0 :(   Can this be patched in and are there any
> other
> > PostFilter dependencies on 4.5?
> >
> > Thanks,
> > Peter
> >
> >
> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <[hidden email]>
> > wrote:
> >
> > > Here is one approach to use in a postfilter
> > >
> > > 1) In the collect() method call score for each doc. Use the scores to
> > > create your scaleInfo.
> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
> ScoreDocs.
> > > 3) Don't delegate any documents to lower collectors in the collect()
> > > method.
> > > 4) In the finish method create a score mapping (use the hppc
> > > IntFloatOpenHashMap) with your top X docIds pointing to their score,
> > using
> > > the priorityQueue created in step 2. Then iterate the bitset (also
> > created
> > > in step 2) sending down each doc to the lower collectors, retrieving
> and
> > > scaling the score from the score map. If the document is not in the
> score
> > > map then send down 0.
> > >
> > > You'll have setup a dummy scorer to feed to lower collectors. The
> > > CollapsingQParserPlugin has an example of how to do this.
> > >
> > >
> > >
> > >
> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <[hidden email]
> > > >wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > I thought about using a PostFilter, but the problem is that the
> 'scale'
> > > > function must be done after all matching docs have been scored but
> > before
> > > > adding them to the PriorityQueue that sorts just the rows to be
> > returned.
> > > > Doing the 'scale' function wrapped in a 'query' is proving to be too
> > slow
> > > > when it visits every document in the index.
> > > >
> > > > In the Collector, I can see how to get the field values like this:
> > > >
> > > >
> > >
> >
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> > > > QParser).getValues()
> > > >
> > > > But, 'getValueSource' needs a QParser, which isn't available.
> > > > And I can't create a QParser without a SolrQueryRequest, which isn't
> > > > available.
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > >
> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <[hidden email]>
> > > > wrote:
> > > >
> > > > > Peter,
> > > > >
> > > > > It sounds like you could achieve what you want to do in a
> PostFilter
> > > > rather
> > > > > then extending the TopDocsCollector. Is there a reason why a
> > PostFilter
> > > > > won't work for you?
> > > > >
> > > > > Joel
> > > > >
> > > > >
> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
> > [hidden email]
> > > > > >wrote:
> > > > >
> > > > > > Quick question:
> > > > > > In the context of a custom collector, how does one get the values
> > of
> > > a
> > > > > > field of type 'ExternalFileField'?
> > > > > >
> > > > > > Thanks,
> > > > > > Peter
> > > > > >
> > > > > >
> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> > > [hidden email]
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi Joel,
> > > > > > >
> > > > > > > This is related to another thread on function query matching (
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > > > > > ).
> > > > > > > The patch in SOLR-4465 will allow me to extend TopDocsCollector
> > and
> > > > > > perform
> > > > > > > the 'scale' function on only the documents matching the main
> > dismax
> > > > > > query.
> > > > > > > As you mention, it is a slightly intrusive design and requires
> > > that I
> > > > > > > manage my own PriorityQueue (and a local duplicate of
> HitQueue),
> > > but
> > > > > > should
> > > > > > > work. I think a better design would hide the PQ from the
> plugin.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Peter
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
> > [hidden email]
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Peter,
> > > > > > >>
> > > > > > >> I've been meaning to revisit configurable ranking collectors,
> > but
> > > I
> > > > > > >> haven't
> > > > > > >> yet had a chance. It's on the shortlist of things I'd like to
> > > tackle
> > > > > > >> though.
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> > > > [hidden email]>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it appears that
> > there
> > > > is
> > > > > a
> > > > > > >> goal
> > > > > > >> > to be able to do custom sorting and ranking in a PostFilter.
> > So
> > > > far,
> > > > > > it
> > > > > > >> > looks like only custom aggregation can be implemented in
> > > > PostFilter
> > > > > > >> (5045).
> > > > > > >> > Custom sorting/ranking can be done in a pluggable collector
> > > > (4465),
> > > > > > but
> > > > > > >> > this patch is no longer in dev.
> > > > > > >> >
> > > > > > >> > Is there any other dev. being done on adding custom sorting
> > > (after
> > > > > > >> > collection) via a plugin?
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > Peter
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Joel Bernstein
> > > > > > >> Search Engineer at Heliosearch
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Joel Bernstein
> > > > > Search Engineer at Heliosearch
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Joel Bernstein
> > > Search Engineer at Heliosearch
> > >
> >
>
>
>
> --
> Joel Bernstein
> Search Engineer at Heliosearch
>
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Peter Keegan
Regarding my original goal, which is to perform a math function using the
scaled score and a field value, and sort on the result, how does this fit
in? Must I implement another custom PostFilter with a higher cost than the
scale PostFilter?

Thanks,
Peter


On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <[hidden email]>wrote:

> Thanks very much for the guidance. I'd be happy to donate a working
> solution.
>
> Peter
>
>
> On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <[hidden email]>wrote:
>
>> SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher I
>> believe. They might apply to 4.3.
>> I think as long you have the finish method that's all you'll need. If you
>> can get this working it would be excellent if you could donate back the
>> Scale PostFilter.
>>
>>
>> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <[hidden email]
>> >wrote:
>>
>> > This is what I was looking for, but the DelegatingCollector 'finish'
>> method
>> > doesn't exist in 4.3.0 :(   Can this be patched in and are there any
>> other
>> > PostFilter dependencies on 4.5?
>> >
>> > Thanks,
>> > Peter
>> >
>> >
>> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <[hidden email]>
>> > wrote:
>> >
>> > > Here is one approach to use in a postfilter
>> > >
>> > > 1) In the collect() method call score for each doc. Use the scores to
>> > > create your scaleInfo.
>> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
>> ScoreDocs.
>> > > 3) Don't delegate any documents to lower collectors in the collect()
>> > > method.
>> > > 4) In the finish method create a score mapping (use the hppc
>> > > IntFloatOpenHashMap) with your top X docIds pointing to their score,
>> > using
>> > > the priorityQueue created in step 2. Then iterate the bitset (also
>> > created
>> > > in step 2) sending down each doc to the lower collectors, retrieving
>> and
>> > > scaling the score from the score map. If the document is not in the
>> score
>> > > map then send down 0.
>> > >
>> > > You'll have setup a dummy scorer to feed to lower collectors. The
>> > > CollapsingQParserPlugin has an example of how to do this.
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <[hidden email]
>> > > >wrote:
>> > >
>> > > > Hi Joel,
>> > > >
>> > > > I thought about using a PostFilter, but the problem is that the
>> 'scale'
>> > > > function must be done after all matching docs have been scored but
>> > before
>> > > > adding them to the PriorityQueue that sorts just the rows to be
>> > returned.
>> > > > Doing the 'scale' function wrapped in a 'query' is proving to be too
>> > slow
>> > > > when it visits every document in the index.
>> > > >
>> > > > In the Collector, I can see how to get the field values like this:
>> > > >
>> > > >
>> > >
>> >
>> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
>> > > > QParser).getValues()
>> > > >
>> > > > But, 'getValueSource' needs a QParser, which isn't available.
>> > > > And I can't create a QParser without a SolrQueryRequest, which isn't
>> > > > available.
>> > > >
>> > > > Thanks,
>> > > > Peter
>> > > >
>> > > >
>> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <[hidden email]
>> >
>> > > > wrote:
>> > > >
>> > > > > Peter,
>> > > > >
>> > > > > It sounds like you could achieve what you want to do in a
>> PostFilter
>> > > > rather
>> > > > > then extending the TopDocsCollector. Is there a reason why a
>> > PostFilter
>> > > > > won't work for you?
>> > > > >
>> > > > > Joel
>> > > > >
>> > > > >
>> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
>> > [hidden email]
>> > > > > >wrote:
>> > > > >
>> > > > > > Quick question:
>> > > > > > In the context of a custom collector, how does one get the
>> values
>> > of
>> > > a
>> > > > > > field of type 'ExternalFileField'?
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Peter
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
>> > > [hidden email]
>> > > > > > >wrote:
>> > > > > >
>> > > > > > > Hi Joel,
>> > > > > > >
>> > > > > > > This is related to another thread on function query matching (
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
>> > > > > > ).
>> > > > > > > The patch in SOLR-4465 will allow me to extend
>> TopDocsCollector
>> > and
>> > > > > > perform
>> > > > > > > the 'scale' function on only the documents matching the main
>> > dismax
>> > > > > > query.
>> > > > > > > As you mention, it is a slightly intrusive design and requires
>> > > that I
>> > > > > > > manage my own PriorityQueue (and a local duplicate of
>> HitQueue),
>> > > but
>> > > > > > should
>> > > > > > > work. I think a better design would hide the PQ from the
>> plugin.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Peter
>> > > > > > >
>> > > > > > >
>> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
>> > [hidden email]
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > >> Hi Peter,
>> > > > > > >>
>> > > > > > >> I've been meaning to revisit configurable ranking collectors,
>> > but
>> > > I
>> > > > > > >> haven't
>> > > > > > >> yet had a chance. It's on the shortlist of things I'd like to
>> > > tackle
>> > > > > > >> though.
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
>> > > > [hidden email]>
>> > > > > > >> wrote:
>> > > > > > >>
>> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it appears that
>> > there
>> > > > is
>> > > > > a
>> > > > > > >> goal
>> > > > > > >> > to be able to do custom sorting and ranking in a
>> PostFilter.
>> > So
>> > > > far,
>> > > > > > it
>> > > > > > >> > looks like only custom aggregation can be implemented in
>> > > > PostFilter
>> > > > > > >> (5045).
>> > > > > > >> > Custom sorting/ranking can be done in a pluggable collector
>> > > > (4465),
>> > > > > > but
>> > > > > > >> > this patch is no longer in dev.
>> > > > > > >> >
>> > > > > > >> > Is there any other dev. being done on adding custom sorting
>> > > (after
>> > > > > > >> > collection) via a plugin?
>> > > > > > >> >
>> > > > > > >> > Thanks,
>> > > > > > >> > Peter
>> > > > > > >> >
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> --
>> > > > > > >> Joel Bernstein
>> > > > > > >> Search Engineer at Heliosearch
>> > > > > > >>
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Joel Bernstein
>> > > > > Search Engineer at Heliosearch
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Joel Bernstein
>> > > Search Engineer at Heliosearch
>> > >
>> >
>>
>>
>>
>> --
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Joel Bernstein
The sorting is going to happen in the lower level collectors. You need a
value source that returns the score of the document being collected.

Here is how you can make this happen:

1) Create an object in your PostFilter that simply holds the current score.
Place this object in the SearchRequest context map. Update object.score as
you pass the docs and scores to the lower collectors.

2) Create a values source that checks the SearchRequest context for the
object that's holding the current score. Use this object to return the
current score when called. For example if you give the value source a
handle called "score" a compound function call will look like this:
sum(score(), field(x))

Joel










On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <[hidden email]>wrote:

> Regarding my original goal, which is to perform a math function using the
> scaled score and a field value, and sort on the result, how does this fit
> in? Must I implement another custom PostFilter with a higher cost than the
> scale PostFilter?
>
> Thanks,
> Peter
>
>
> On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <[hidden email]
> >wrote:
>
> > Thanks very much for the guidance. I'd be happy to donate a working
> > solution.
> >
> > Peter
> >
> >
> > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <[hidden email]
> >wrote:
> >
> >> SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher
> I
> >> believe. They might apply to 4.3.
> >> I think as long you have the finish method that's all you'll need. If
> you
> >> can get this working it would be excellent if you could donate back the
> >> Scale PostFilter.
> >>
> >>
> >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <[hidden email]
> >> >wrote:
> >>
> >> > This is what I was looking for, but the DelegatingCollector 'finish'
> >> method
> >> > doesn't exist in 4.3.0 :(   Can this be patched in and are there any
> >> other
> >> > PostFilter dependencies on 4.5?
> >> >
> >> > Thanks,
> >> > Peter
> >> >
> >> >
> >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <[hidden email]>
> >> > wrote:
> >> >
> >> > > Here is one approach to use in a postfilter
> >> > >
> >> > > 1) In the collect() method call score for each doc. Use the scores
> to
> >> > > create your scaleInfo.
> >> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
> >> ScoreDocs.
> >> > > 3) Don't delegate any documents to lower collectors in the collect()
> >> > > method.
> >> > > 4) In the finish method create a score mapping (use the hppc
> >> > > IntFloatOpenHashMap) with your top X docIds pointing to their score,
> >> > using
> >> > > the priorityQueue created in step 2. Then iterate the bitset (also
> >> > created
> >> > > in step 2) sending down each doc to the lower collectors, retrieving
> >> and
> >> > > scaling the score from the score map. If the document is not in the
> >> score
> >> > > map then send down 0.
> >> > >
> >> > > You'll have setup a dummy scorer to feed to lower collectors. The
> >> > > CollapsingQParserPlugin has an example of how to do this.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
> [hidden email]
> >> > > >wrote:
> >> > >
> >> > > > Hi Joel,
> >> > > >
> >> > > > I thought about using a PostFilter, but the problem is that the
> >> 'scale'
> >> > > > function must be done after all matching docs have been scored but
> >> > before
> >> > > > adding them to the PriorityQueue that sorts just the rows to be
> >> > returned.
> >> > > > Doing the 'scale' function wrapped in a 'query' is proving to be
> too
> >> > slow
> >> > > > when it visits every document in the index.
> >> > > >
> >> > > > In the Collector, I can see how to get the field values like this:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> >> > > > QParser).getValues()
> >> > > >
> >> > > > But, 'getValueSource' needs a QParser, which isn't available.
> >> > > > And I can't create a QParser without a SolrQueryRequest, which
> isn't
> >> > > > available.
> >> > > >
> >> > > > Thanks,
> >> > > > Peter
> >> > > >
> >> > > >
> >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
> [hidden email]
> >> >
> >> > > > wrote:
> >> > > >
> >> > > > > Peter,
> >> > > > >
> >> > > > > It sounds like you could achieve what you want to do in a
> >> PostFilter
> >> > > > rather
> >> > > > > then extending the TopDocsCollector. Is there a reason why a
> >> > PostFilter
> >> > > > > won't work for you?
> >> > > > >
> >> > > > > Joel
> >> > > > >
> >> > > > >
> >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
> >> > [hidden email]
> >> > > > > >wrote:
> >> > > > >
> >> > > > > > Quick question:
> >> > > > > > In the context of a custom collector, how does one get the
> >> values
> >> > of
> >> > > a
> >> > > > > > field of type 'ExternalFileField'?
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Peter
> >> > > > > >
> >> > > > > >
> >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> >> > > [hidden email]
> >> > > > > > >wrote:
> >> > > > > >
> >> > > > > > > Hi Joel,
> >> > > > > > >
> >> > > > > > > This is related to another thread on function query
> matching (
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> >> > > > > > ).
> >> > > > > > > The patch in SOLR-4465 will allow me to extend
> >> TopDocsCollector
> >> > and
> >> > > > > > perform
> >> > > > > > > the 'scale' function on only the documents matching the main
> >> > dismax
> >> > > > > > query.
> >> > > > > > > As you mention, it is a slightly intrusive design and
> requires
> >> > > that I
> >> > > > > > > manage my own PriorityQueue (and a local duplicate of
> >> HitQueue),
> >> > > but
> >> > > > > > should
> >> > > > > > > work. I think a better design would hide the PQ from the
> >> plugin.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Peter
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
> >> > [hidden email]
> >> > > >
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > >> Hi Peter,
> >> > > > > > >>
> >> > > > > > >> I've been meaning to revisit configurable ranking
> collectors,
> >> > but
> >> > > I
> >> > > > > > >> haven't
> >> > > > > > >> yet had a chance. It's on the shortlist of things I'd like
> to
> >> > > tackle
> >> > > > > > >> though.
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> >> > > > [hidden email]>
> >> > > > > > >> wrote:
> >> > > > > > >>
> >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it appears
> that
> >> > there
> >> > > > is
> >> > > > > a
> >> > > > > > >> goal
> >> > > > > > >> > to be able to do custom sorting and ranking in a
> >> PostFilter.
> >> > So
> >> > > > far,
> >> > > > > > it
> >> > > > > > >> > looks like only custom aggregation can be implemented in
> >> > > > PostFilter
> >> > > > > > >> (5045).
> >> > > > > > >> > Custom sorting/ranking can be done in a pluggable
> collector
> >> > > > (4465),
> >> > > > > > but
> >> > > > > > >> > this patch is no longer in dev.
> >> > > > > > >> >
> >> > > > > > >> > Is there any other dev. being done on adding custom
> sorting
> >> > > (after
> >> > > > > > >> > collection) via a plugin?
> >> > > > > > >> >
> >> > > > > > >> > Thanks,
> >> > > > > > >> > Peter
> >> > > > > > >> >
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >>
> >> > > > > > >> --
> >> > > > > > >> Joel Bernstein
> >> > > > > > >> Search Engineer at Heliosearch
> >> > > > > > >>
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Joel Bernstein
> >> > > > > Search Engineer at Heliosearch
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Joel Bernstein
> >> > > Search Engineer at Heliosearch
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Joel Bernstein
> >> Search Engineer at Heliosearch
> >>
> >
> >
>



--
Joel Bernstein
Search Engineer at Heliosearch
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Peter Keegan
This is pretty cool, and worthy of adding to Solr in Action (v2) and the
other books. With function queries, flexible filter processing and caching,
custom collectors, and post filters, there's a lot of flexibility here.

Btw, the query times using a custom collector to scale/recompute scores is
excellent (will have to see how it compares to your outlined solution).

Thanks,
Peter


On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <[hidden email]> wrote:

> The sorting is going to happen in the lower level collectors. You need a
> value source that returns the score of the document being collected.
>
> Here is how you can make this happen:
>
> 1) Create an object in your PostFilter that simply holds the current score.
> Place this object in the SearchRequest context map. Update object.score as
> you pass the docs and scores to the lower collectors.
>
> 2) Create a values source that checks the SearchRequest context for the
> object that's holding the current score. Use this object to return the
> current score when called. For example if you give the value source a
> handle called "score" a compound function call will look like this:
> sum(score(), field(x))
>
> Joel
>
>
>
>
>
>
>
>
>
>
> On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <[hidden email]
> >wrote:
>
> > Regarding my original goal, which is to perform a math function using the
> > scaled score and a field value, and sort on the result, how does this fit
> > in? Must I implement another custom PostFilter with a higher cost than
> the
> > scale PostFilter?
> >
> > Thanks,
> > Peter
> >
> >
> > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <[hidden email]
> > >wrote:
> >
> > > Thanks very much for the guidance. I'd be happy to donate a working
> > > solution.
> > >
> > > Peter
> > >
> > >
> > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <[hidden email]
> > >wrote:
> > >
> > >> SOLR-5020 has the commit info, it's mainly changes to
> SolrIndexSearcher
> > I
> > >> believe. They might apply to 4.3.
> > >> I think as long you have the finish method that's all you'll need. If
> > you
> > >> can get this working it would be excellent if you could donate back
> the
> > >> Scale PostFilter.
> > >>
> > >>
> > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <[hidden email]
> > >> >wrote:
> > >>
> > >> > This is what I was looking for, but the DelegatingCollector 'finish'
> > >> method
> > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are there any
> > >> other
> > >> > PostFilter dependencies on 4.5?
> > >> >
> > >> > Thanks,
> > >> > Peter
> > >> >
> > >> >
> > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <[hidden email]
> >
> > >> > wrote:
> > >> >
> > >> > > Here is one approach to use in a postfilter
> > >> > >
> > >> > > 1) In the collect() method call score for each doc. Use the scores
> > to
> > >> > > create your scaleInfo.
> > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
> > >> ScoreDocs.
> > >> > > 3) Don't delegate any documents to lower collectors in the
> collect()
> > >> > > method.
> > >> > > 4) In the finish method create a score mapping (use the hppc
> > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their
> score,
> > >> > using
> > >> > > the priorityQueue created in step 2. Then iterate the bitset (also
> > >> > created
> > >> > > in step 2) sending down each doc to the lower collectors,
> retrieving
> > >> and
> > >> > > scaling the score from the score map. If the document is not in
> the
> > >> score
> > >> > > map then send down 0.
> > >> > >
> > >> > > You'll have setup a dummy scorer to feed to lower collectors. The
> > >> > > CollapsingQParserPlugin has an example of how to do this.
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
> > [hidden email]
> > >> > > >wrote:
> > >> > >
> > >> > > > Hi Joel,
> > >> > > >
> > >> > > > I thought about using a PostFilter, but the problem is that the
> > >> 'scale'
> > >> > > > function must be done after all matching docs have been scored
> but
> > >> > before
> > >> > > > adding them to the PriorityQueue that sorts just the rows to be
> > >> > returned.
> > >> > > > Doing the 'scale' function wrapped in a 'query' is proving to be
> > too
> > >> > slow
> > >> > > > when it visits every document in the index.
> > >> > > >
> > >> > > > In the Collector, I can see how to get the field values like
> this:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> > >> > > > QParser).getValues()
> > >> > > >
> > >> > > > But, 'getValueSource' needs a QParser, which isn't available.
> > >> > > > And I can't create a QParser without a SolrQueryRequest, which
> > isn't
> > >> > > > available.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Peter
> > >> > > >
> > >> > > >
> > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
> > [hidden email]
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Peter,
> > >> > > > >
> > >> > > > > It sounds like you could achieve what you want to do in a
> > >> PostFilter
> > >> > > > rather
> > >> > > > > then extending the TopDocsCollector. Is there a reason why a
> > >> > PostFilter
> > >> > > > > won't work for you?
> > >> > > > >
> > >> > > > > Joel
> > >> > > > >
> > >> > > > >
> > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
> > >> > [hidden email]
> > >> > > > > >wrote:
> > >> > > > >
> > >> > > > > > Quick question:
> > >> > > > > > In the context of a custom collector, how does one get the
> > >> values
> > >> > of
> > >> > > a
> > >> > > > > > field of type 'ExternalFileField'?
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > > Peter
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> > >> > > [hidden email]
> > >> > > > > > >wrote:
> > >> > > > > >
> > >> > > > > > > Hi Joel,
> > >> > > > > > >
> > >> > > > > > > This is related to another thread on function query
> > matching (
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > >> > > > > > ).
> > >> > > > > > > The patch in SOLR-4465 will allow me to extend
> > >> TopDocsCollector
> > >> > and
> > >> > > > > > perform
> > >> > > > > > > the 'scale' function on only the documents matching the
> main
> > >> > dismax
> > >> > > > > > query.
> > >> > > > > > > As you mention, it is a slightly intrusive design and
> > requires
> > >> > > that I
> > >> > > > > > > manage my own PriorityQueue (and a local duplicate of
> > >> HitQueue),
> > >> > > but
> > >> > > > > > should
> > >> > > > > > > work. I think a better design would hide the PQ from the
> > >> plugin.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > > Peter
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
> > >> > [hidden email]
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > >> Hi Peter,
> > >> > > > > > >>
> > >> > > > > > >> I've been meaning to revisit configurable ranking
> > collectors,
> > >> > but
> > >> > > I
> > >> > > > > > >> haven't
> > >> > > > > > >> yet had a chance. It's on the shortlist of things I'd
> like
> > to
> > >> > > tackle
> > >> > > > > > >> though.
> > >> > > > > > >>
> > >> > > > > > >>
> > >> > > > > > >>
> > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> > >> > > > [hidden email]>
> > >> > > > > > >> wrote:
> > >> > > > > > >>
> > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it appears
> > that
> > >> > there
> > >> > > > is
> > >> > > > > a
> > >> > > > > > >> goal
> > >> > > > > > >> > to be able to do custom sorting and ranking in a
> > >> PostFilter.
> > >> > So
> > >> > > > far,
> > >> > > > > > it
> > >> > > > > > >> > looks like only custom aggregation can be implemented
> in
> > >> > > > PostFilter
> > >> > > > > > >> (5045).
> > >> > > > > > >> > Custom sorting/ranking can be done in a pluggable
> > collector
> > >> > > > (4465),
> > >> > > > > > but
> > >> > > > > > >> > this patch is no longer in dev.
> > >> > > > > > >> >
> > >> > > > > > >> > Is there any other dev. being done on adding custom
> > sorting
> > >> > > (after
> > >> > > > > > >> > collection) via a plugin?
> > >> > > > > > >> >
> > >> > > > > > >> > Thanks,
> > >> > > > > > >> > Peter
> > >> > > > > > >> >
> > >> > > > > > >>
> > >> > > > > > >>
> > >> > > > > > >>
> > >> > > > > > >> --
> > >> > > > > > >> Joel Bernstein
> > >> > > > > > >> Search Engineer at Heliosearch
> > >> > > > > > >>
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Joel Bernstein
> > >> > > > > Search Engineer at Heliosearch
> > >> > > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Joel Bernstein
> > >> > > Search Engineer at Heliosearch
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Joel Bernstein
> > >> Search Engineer at Heliosearch
> > >>
> > >
> > >
> >
>
>
>
> --
> Joel Bernstein
> Search Engineer at Heliosearch
>
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Joel Bernstein
Thanks, I agree this powerful stuff. One of the reasons that I haven't
gotten back to pluggable collectors is that I've been using PostFilters
instead.

When you start doing stuff with scores in postfilters you'll run into the
bug in SOLR-5416. This will effect you when you use facets in combination
with the QueryResultCache or tag and exclude faceting.

The patch in SOLR-5416 resolves this issue. You'll just need your
PostFilter to implement ScoreFilter and the SolrIndexSearcher will know how
to handle things.

The DelegatingCollector.finish() method is so new, these kinds of bugs are
still being cleaned out of the system. SOLR-5416 should be in Solr 4.7.









On Thu, Dec 12, 2013 at 12:54 PM, Peter Keegan <[hidden email]>wrote:

> This is pretty cool, and worthy of adding to Solr in Action (v2) and the
> other books. With function queries, flexible filter processing and caching,
> custom collectors, and post filters, there's a lot of flexibility here.
>
> Btw, the query times using a custom collector to scale/recompute scores is
> excellent (will have to see how it compares to your outlined solution).
>
> Thanks,
> Peter
>
>
> On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <[hidden email]>
> wrote:
>
> > The sorting is going to happen in the lower level collectors. You need a
> > value source that returns the score of the document being collected.
> >
> > Here is how you can make this happen:
> >
> > 1) Create an object in your PostFilter that simply holds the current
> score.
> > Place this object in the SearchRequest context map. Update object.score
> as
> > you pass the docs and scores to the lower collectors.
> >
> > 2) Create a values source that checks the SearchRequest context for the
> > object that's holding the current score. Use this object to return the
> > current score when called. For example if you give the value source a
> > handle called "score" a compound function call will look like this:
> > sum(score(), field(x))
> >
> > Joel
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <[hidden email]
> > >wrote:
> >
> > > Regarding my original goal, which is to perform a math function using
> the
> > > scaled score and a field value, and sort on the result, how does this
> fit
> > > in? Must I implement another custom PostFilter with a higher cost than
> > the
> > > scale PostFilter?
> > >
> > > Thanks,
> > > Peter
> > >
> > >
> > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <[hidden email]
> > > >wrote:
> > >
> > > > Thanks very much for the guidance. I'd be happy to donate a working
> > > > solution.
> > > >
> > > > Peter
> > > >
> > > >
> > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <[hidden email]
> > > >wrote:
> > > >
> > > >> SOLR-5020 has the commit info, it's mainly changes to
> > SolrIndexSearcher
> > > I
> > > >> believe. They might apply to 4.3.
> > > >> I think as long you have the finish method that's all you'll need.
> If
> > > you
> > > >> can get this working it would be excellent if you could donate back
> > the
> > > >> Scale PostFilter.
> > > >>
> > > >>
> > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <
> [hidden email]
> > > >> >wrote:
> > > >>
> > > >> > This is what I was looking for, but the DelegatingCollector
> 'finish'
> > > >> method
> > > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are there
> any
> > > >> other
> > > >> > PostFilter dependencies on 4.5?
> > > >> >
> > > >> > Thanks,
> > > >> > Peter
> > > >> >
> > > >> >
> > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <
> [hidden email]
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Here is one approach to use in a postfilter
> > > >> > >
> > > >> > > 1) In the collect() method call score for each doc. Use the
> scores
> > > to
> > > >> > > create your scaleInfo.
> > > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
> > > >> ScoreDocs.
> > > >> > > 3) Don't delegate any documents to lower collectors in the
> > collect()
> > > >> > > method.
> > > >> > > 4) In the finish method create a score mapping (use the hppc
> > > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their
> > score,
> > > >> > using
> > > >> > > the priorityQueue created in step 2. Then iterate the bitset
> (also
> > > >> > created
> > > >> > > in step 2) sending down each doc to the lower collectors,
> > retrieving
> > > >> and
> > > >> > > scaling the score from the score map. If the document is not in
> > the
> > > >> score
> > > >> > > map then send down 0.
> > > >> > >
> > > >> > > You'll have setup a dummy scorer to feed to lower collectors.
> The
> > > >> > > CollapsingQParserPlugin has an example of how to do this.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
> > > [hidden email]
> > > >> > > >wrote:
> > > >> > >
> > > >> > > > Hi Joel,
> > > >> > > >
> > > >> > > > I thought about using a PostFilter, but the problem is that
> the
> > > >> 'scale'
> > > >> > > > function must be done after all matching docs have been scored
> > but
> > > >> > before
> > > >> > > > adding them to the PriorityQueue that sorts just the rows to
> be
> > > >> > returned.
> > > >> > > > Doing the 'scale' function wrapped in a 'query' is proving to
> be
> > > too
> > > >> > slow
> > > >> > > > when it visits every document in the index.
> > > >> > > >
> > > >> > > > In the Collector, I can see how to get the field values like
> > this:
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> > > >> > > > QParser).getValues()
> > > >> > > >
> > > >> > > > But, 'getValueSource' needs a QParser, which isn't available.
> > > >> > > > And I can't create a QParser without a SolrQueryRequest, which
> > > isn't
> > > >> > > > available.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Peter
> > > >> > > >
> > > >> > > >
> > > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
> > > [hidden email]
> > > >> >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Peter,
> > > >> > > > >
> > > >> > > > > It sounds like you could achieve what you want to do in a
> > > >> PostFilter
> > > >> > > > rather
> > > >> > > > > then extending the TopDocsCollector. Is there a reason why a
> > > >> > PostFilter
> > > >> > > > > won't work for you?
> > > >> > > > >
> > > >> > > > > Joel
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
> > > >> > [hidden email]
> > > >> > > > > >wrote:
> > > >> > > > >
> > > >> > > > > > Quick question:
> > > >> > > > > > In the context of a custom collector, how does one get the
> > > >> values
> > > >> > of
> > > >> > > a
> > > >> > > > > > field of type 'ExternalFileField'?
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Peter
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> > > >> > > [hidden email]
> > > >> > > > > > >wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Joel,
> > > >> > > > > > >
> > > >> > > > > > > This is related to another thread on function query
> > > matching (
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > > >> > > > > > ).
> > > >> > > > > > > The patch in SOLR-4465 will allow me to extend
> > > >> TopDocsCollector
> > > >> > and
> > > >> > > > > > perform
> > > >> > > > > > > the 'scale' function on only the documents matching the
> > main
> > > >> > dismax
> > > >> > > > > > query.
> > > >> > > > > > > As you mention, it is a slightly intrusive design and
> > > requires
> > > >> > > that I
> > > >> > > > > > > manage my own PriorityQueue (and a local duplicate of
> > > >> HitQueue),
> > > >> > > but
> > > >> > > > > > should
> > > >> > > > > > > work. I think a better design would hide the PQ from the
> > > >> plugin.
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > > Peter
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
> > > >> > [hidden email]
> > > >> > > >
> > > >> > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > >> Hi Peter,
> > > >> > > > > > >>
> > > >> > > > > > >> I've been meaning to revisit configurable ranking
> > > collectors,
> > > >> > but
> > > >> > > I
> > > >> > > > > > >> haven't
> > > >> > > > > > >> yet had a chance. It's on the shortlist of things I'd
> > like
> > > to
> > > >> > > tackle
> > > >> > > > > > >> though.
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> > > >> > > > [hidden email]>
> > > >> > > > > > >> wrote:
> > > >> > > > > > >>
> > > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it appears
> > > that
> > > >> > there
> > > >> > > > is
> > > >> > > > > a
> > > >> > > > > > >> goal
> > > >> > > > > > >> > to be able to do custom sorting and ranking in a
> > > >> PostFilter.
> > > >> > So
> > > >> > > > far,
> > > >> > > > > > it
> > > >> > > > > > >> > looks like only custom aggregation can be implemented
> > in
> > > >> > > > PostFilter
> > > >> > > > > > >> (5045).
> > > >> > > > > > >> > Custom sorting/ranking can be done in a pluggable
> > > collector
> > > >> > > > (4465),
> > > >> > > > > > but
> > > >> > > > > > >> > this patch is no longer in dev.
> > > >> > > > > > >> >
> > > >> > > > > > >> > Is there any other dev. being done on adding custom
> > > sorting
> > > >> > > (after
> > > >> > > > > > >> > collection) via a plugin?
> > > >> > > > > > >> >
> > > >> > > > > > >> > Thanks,
> > > >> > > > > > >> > Peter
> > > >> > > > > > >> >
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >> --
> > > >> > > > > > >> Joel Bernstein
> > > >> > > > > > >> Search Engineer at Heliosearch
> > > >> > > > > > >>
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Joel Bernstein
> > > >> > > > > Search Engineer at Heliosearch
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Joel Bernstein
> > > >> > > Search Engineer at Heliosearch
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Joel Bernstein
> > > >> Search Engineer at Heliosearch
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
>



--
Joel Bernstein
Search Engineer at Heliosearch
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Peter Keegan
In order to size the PriorityQueue, the result window size for the query is
needed. This has been computed in the SolrIndexSearcher and available in:
QueryCommand.getSupersetMaxDoc(), but doesn't seem to be available for the
PostFilter in either the SolrParms or SolrQueryRequest. Is there a way to
get this precomputed value or do I have to duplicate the logic from
SolrIndexSearcher?

Thanks,
Peter


On Thu, Dec 12, 2013 at 1:53 PM, Joel Bernstein <[hidden email]> wrote:

> Thanks, I agree this powerful stuff. One of the reasons that I haven't
> gotten back to pluggable collectors is that I've been using PostFilters
> instead.
>
> When you start doing stuff with scores in postfilters you'll run into the
> bug in SOLR-5416. This will effect you when you use facets in combination
> with the QueryResultCache or tag and exclude faceting.
>
> The patch in SOLR-5416 resolves this issue. You'll just need your
> PostFilter to implement ScoreFilter and the SolrIndexSearcher will know how
> to handle things.
>
> The DelegatingCollector.finish() method is so new, these kinds of bugs are
> still being cleaned out of the system. SOLR-5416 should be in Solr 4.7.
>
>
>
>
>
>
>
>
>
> On Thu, Dec 12, 2013 at 12:54 PM, Peter Keegan <[hidden email]
> >wrote:
>
> > This is pretty cool, and worthy of adding to Solr in Action (v2) and the
> > other books. With function queries, flexible filter processing and
> caching,
> > custom collectors, and post filters, there's a lot of flexibility here.
> >
> > Btw, the query times using a custom collector to scale/recompute scores
> is
> > excellent (will have to see how it compares to your outlined solution).
> >
> > Thanks,
> > Peter
> >
> >
> > On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <[hidden email]>
> > wrote:
> >
> > > The sorting is going to happen in the lower level collectors. You need
> a
> > > value source that returns the score of the document being collected.
> > >
> > > Here is how you can make this happen:
> > >
> > > 1) Create an object in your PostFilter that simply holds the current
> > score.
> > > Place this object in the SearchRequest context map. Update object.score
> > as
> > > you pass the docs and scores to the lower collectors.
> > >
> > > 2) Create a values source that checks the SearchRequest context for the
> > > object that's holding the current score. Use this object to return the
> > > current score when called. For example if you give the value source a
> > > handle called "score" a compound function call will look like this:
> > > sum(score(), field(x))
> > >
> > > Joel
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <[hidden email]
> > > >wrote:
> > >
> > > > Regarding my original goal, which is to perform a math function using
> > the
> > > > scaled score and a field value, and sort on the result, how does this
> > fit
> > > > in? Must I implement another custom PostFilter with a higher cost
> than
> > > the
> > > > scale PostFilter?
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > >
> > > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <
> [hidden email]
> > > > >wrote:
> > > >
> > > > > Thanks very much for the guidance. I'd be happy to donate a working
> > > > > solution.
> > > > >
> > > > > Peter
> > > > >
> > > > >
> > > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <
> [hidden email]
> > > > >wrote:
> > > > >
> > > > >> SOLR-5020 has the commit info, it's mainly changes to
> > > SolrIndexSearcher
> > > > I
> > > > >> believe. They might apply to 4.3.
> > > > >> I think as long you have the finish method that's all you'll need.
> > If
> > > > you
> > > > >> can get this working it would be excellent if you could donate
> back
> > > the
> > > > >> Scale PostFilter.
> > > > >>
> > > > >>
> > > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <
> > [hidden email]
> > > > >> >wrote:
> > > > >>
> > > > >> > This is what I was looking for, but the DelegatingCollector
> > 'finish'
> > > > >> method
> > > > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are there
> > any
> > > > >> other
> > > > >> > PostFilter dependencies on 4.5?
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Peter
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <
> > [hidden email]
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Here is one approach to use in a postfilter
> > > > >> > >
> > > > >> > > 1) In the collect() method call score for each doc. Use the
> > scores
> > > > to
> > > > >> > > create your scaleInfo.
> > > > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
> > > > >> ScoreDocs.
> > > > >> > > 3) Don't delegate any documents to lower collectors in the
> > > collect()
> > > > >> > > method.
> > > > >> > > 4) In the finish method create a score mapping (use the hppc
> > > > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their
> > > score,
> > > > >> > using
> > > > >> > > the priorityQueue created in step 2. Then iterate the bitset
> > (also
> > > > >> > created
> > > > >> > > in step 2) sending down each doc to the lower collectors,
> > > retrieving
> > > > >> and
> > > > >> > > scaling the score from the score map. If the document is not
> in
> > > the
> > > > >> score
> > > > >> > > map then send down 0.
> > > > >> > >
> > > > >> > > You'll have setup a dummy scorer to feed to lower collectors.
> > The
> > > > >> > > CollapsingQParserPlugin has an example of how to do this.
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
> > > > [hidden email]
> > > > >> > > >wrote:
> > > > >> > >
> > > > >> > > > Hi Joel,
> > > > >> > > >
> > > > >> > > > I thought about using a PostFilter, but the problem is that
> > the
> > > > >> 'scale'
> > > > >> > > > function must be done after all matching docs have been
> scored
> > > but
> > > > >> > before
> > > > >> > > > adding them to the PriorityQueue that sorts just the rows to
> > be
> > > > >> > returned.
> > > > >> > > > Doing the 'scale' function wrapped in a 'query' is proving
> to
> > be
> > > > too
> > > > >> > slow
> > > > >> > > > when it visits every document in the index.
> > > > >> > > >
> > > > >> > > > In the Collector, I can see how to get the field values like
> > > this:
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> > > > >> > > > QParser).getValues()
> > > > >> > > >
> > > > >> > > > But, 'getValueSource' needs a QParser, which isn't
> available.
> > > > >> > > > And I can't create a QParser without a SolrQueryRequest,
> which
> > > > isn't
> > > > >> > > > available.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Peter
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
> > > > [hidden email]
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > Peter,
> > > > >> > > > >
> > > > >> > > > > It sounds like you could achieve what you want to do in a
> > > > >> PostFilter
> > > > >> > > > rather
> > > > >> > > > > then extending the TopDocsCollector. Is there a reason
> why a
> > > > >> > PostFilter
> > > > >> > > > > won't work for you?
> > > > >> > > > >
> > > > >> > > > > Joel
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
> > > > >> > [hidden email]
> > > > >> > > > > >wrote:
> > > > >> > > > >
> > > > >> > > > > > Quick question:
> > > > >> > > > > > In the context of a custom collector, how does one get
> the
> > > > >> values
> > > > >> > of
> > > > >> > > a
> > > > >> > > > > > field of type 'ExternalFileField'?
> > > > >> > > > > >
> > > > >> > > > > > Thanks,
> > > > >> > > > > > Peter
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> > > > >> > > [hidden email]
> > > > >> > > > > > >wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Hi Joel,
> > > > >> > > > > > >
> > > > >> > > > > > > This is related to another thread on function query
> > > > matching (
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > > > >> > > > > > ).
> > > > >> > > > > > > The patch in SOLR-4465 will allow me to extend
> > > > >> TopDocsCollector
> > > > >> > and
> > > > >> > > > > > perform
> > > > >> > > > > > > the 'scale' function on only the documents matching
> the
> > > main
> > > > >> > dismax
> > > > >> > > > > > query.
> > > > >> > > > > > > As you mention, it is a slightly intrusive design and
> > > > requires
> > > > >> > > that I
> > > > >> > > > > > > manage my own PriorityQueue (and a local duplicate of
> > > > >> HitQueue),
> > > > >> > > but
> > > > >> > > > > > should
> > > > >> > > > > > > work. I think a better design would hide the PQ from
> the
> > > > >> plugin.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > > Peter
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
> > > > >> > [hidden email]
> > > > >> > > >
> > > > >> > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > >> Hi Peter,
> > > > >> > > > > > >>
> > > > >> > > > > > >> I've been meaning to revisit configurable ranking
> > > > collectors,
> > > > >> > but
> > > > >> > > I
> > > > >> > > > > > >> haven't
> > > > >> > > > > > >> yet had a chance. It's on the shortlist of things I'd
> > > like
> > > > to
> > > > >> > > tackle
> > > > >> > > > > > >> though.
> > > > >> > > > > > >>
> > > > >> > > > > > >>
> > > > >> > > > > > >>
> > > > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> > > > >> > > > [hidden email]>
> > > > >> > > > > > >> wrote:
> > > > >> > > > > > >>
> > > > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it
> appears
> > > > that
> > > > >> > there
> > > > >> > > > is
> > > > >> > > > > a
> > > > >> > > > > > >> goal
> > > > >> > > > > > >> > to be able to do custom sorting and ranking in a
> > > > >> PostFilter.
> > > > >> > So
> > > > >> > > > far,
> > > > >> > > > > > it
> > > > >> > > > > > >> > looks like only custom aggregation can be
> implemented
> > > in
> > > > >> > > > PostFilter
> > > > >> > > > > > >> (5045).
> > > > >> > > > > > >> > Custom sorting/ranking can be done in a pluggable
> > > > collector
> > > > >> > > > (4465),
> > > > >> > > > > > but
> > > > >> > > > > > >> > this patch is no longer in dev.
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > Is there any other dev. being done on adding custom
> > > > sorting
> > > > >> > > (after
> > > > >> > > > > > >> > collection) via a plugin?
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > Thanks,
> > > > >> > > > > > >> > Peter
> > > > >> > > > > > >> >
> > > > >> > > > > > >>
> > > > >> > > > > > >>
> > > > >> > > > > > >>
> > > > >> > > > > > >> --
> > > > >> > > > > > >> Joel Bernstein
> > > > >> > > > > > >> Search Engineer at Heliosearch
> > > > >> > > > > > >>
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Joel Bernstein
> > > > >> > > > > Search Engineer at Heliosearch
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Joel Bernstein
> > > > >> > > Search Engineer at Heliosearch
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Joel Bernstein
> > > > >> Search Engineer at Heliosearch
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Joel Bernstein
> > > Search Engineer at Heliosearch
> > >
> >
>
>
>
> --
> Joel Bernstein
> Search Engineer at Heliosearch
>
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Peter Keegan
I implemented the PostFilter approach described by Joel. Just iterating
over the OpenBitSet, even without the scaling or the HashMap lookup, added
30ms to a query time, which kinda surprised me. There were about 150K hits
out of a total of 500K. Is OpenBitSet the best way to do this?

Thanks,
Peter


On Thu, Dec 19, 2013 at 9:51 AM, Peter Keegan <[hidden email]>wrote:

> In order to size the PriorityQueue, the result window size for the query
> is needed. This has been computed in the SolrIndexSearcher and available
> in: QueryCommand.getSupersetMaxDoc(), but doesn't seem to be available for
> the PostFilter in either the SolrParms or SolrQueryRequest. Is there a way
> to get this precomputed value or do I have to duplicate the logic from
> SolrIndexSearcher?
>
> Thanks,
> Peter
>
>
> On Thu, Dec 12, 2013 at 1:53 PM, Joel Bernstein <[hidden email]>wrote:
>
>> Thanks, I agree this powerful stuff. One of the reasons that I haven't
>> gotten back to pluggable collectors is that I've been using PostFilters
>> instead.
>>
>> When you start doing stuff with scores in postfilters you'll run into the
>> bug in SOLR-5416. This will effect you when you use facets in combination
>> with the QueryResultCache or tag and exclude faceting.
>>
>> The patch in SOLR-5416 resolves this issue. You'll just need your
>> PostFilter to implement ScoreFilter and the SolrIndexSearcher will know
>> how
>> to handle things.
>>
>> The DelegatingCollector.finish() method is so new, these kinds of bugs are
>> still being cleaned out of the system. SOLR-5416 should be in Solr 4.7.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Dec 12, 2013 at 12:54 PM, Peter Keegan <[hidden email]
>> >wrote:
>>
>> > This is pretty cool, and worthy of adding to Solr in Action (v2) and the
>> > other books. With function queries, flexible filter processing and
>> caching,
>> > custom collectors, and post filters, there's a lot of flexibility here.
>> >
>> > Btw, the query times using a custom collector to scale/recompute scores
>> is
>> > excellent (will have to see how it compares to your outlined solution).
>> >
>> > Thanks,
>> > Peter
>> >
>> >
>> > On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <[hidden email]>
>> > wrote:
>> >
>> > > The sorting is going to happen in the lower level collectors. You
>> need a
>> > > value source that returns the score of the document being collected.
>> > >
>> > > Here is how you can make this happen:
>> > >
>> > > 1) Create an object in your PostFilter that simply holds the current
>> > score.
>> > > Place this object in the SearchRequest context map. Update
>> object.score
>> > as
>> > > you pass the docs and scores to the lower collectors.
>> > >
>> > > 2) Create a values source that checks the SearchRequest context for
>> the
>> > > object that's holding the current score. Use this object to return the
>> > > current score when called. For example if you give the value source a
>> > > handle called "score" a compound function call will look like this:
>> > > sum(score(), field(x))
>> > >
>> > > Joel
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <[hidden email]
>> > > >wrote:
>> > >
>> > > > Regarding my original goal, which is to perform a math function
>> using
>> > the
>> > > > scaled score and a field value, and sort on the result, how does
>> this
>> > fit
>> > > > in? Must I implement another custom PostFilter with a higher cost
>> than
>> > > the
>> > > > scale PostFilter?
>> > > >
>> > > > Thanks,
>> > > > Peter
>> > > >
>> > > >
>> > > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <
>> [hidden email]
>> > > > >wrote:
>> > > >
>> > > > > Thanks very much for the guidance. I'd be happy to donate a
>> working
>> > > > > solution.
>> > > > >
>> > > > > Peter
>> > > > >
>> > > > >
>> > > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <
>> [hidden email]
>> > > > >wrote:
>> > > > >
>> > > > >> SOLR-5020 has the commit info, it's mainly changes to
>> > > SolrIndexSearcher
>> > > > I
>> > > > >> believe. They might apply to 4.3.
>> > > > >> I think as long you have the finish method that's all you'll
>> need.
>> > If
>> > > > you
>> > > > >> can get this working it would be excellent if you could donate
>> back
>> > > the
>> > > > >> Scale PostFilter.
>> > > > >>
>> > > > >>
>> > > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <
>> > [hidden email]
>> > > > >> >wrote:
>> > > > >>
>> > > > >> > This is what I was looking for, but the DelegatingCollector
>> > 'finish'
>> > > > >> method
>> > > > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are
>> there
>> > any
>> > > > >> other
>> > > > >> > PostFilter dependencies on 4.5?
>> > > > >> >
>> > > > >> > Thanks,
>> > > > >> > Peter
>> > > > >> >
>> > > > >> >
>> > > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <
>> > [hidden email]
>> > > >
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> > > Here is one approach to use in a postfilter
>> > > > >> > >
>> > > > >> > > 1) In the collect() method call score for each doc. Use the
>> > scores
>> > > > to
>> > > > >> > > create your scaleInfo.
>> > > > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top
>> X
>> > > > >> ScoreDocs.
>> > > > >> > > 3) Don't delegate any documents to lower collectors in the
>> > > collect()
>> > > > >> > > method.
>> > > > >> > > 4) In the finish method create a score mapping (use the hppc
>> > > > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their
>> > > score,
>> > > > >> > using
>> > > > >> > > the priorityQueue created in step 2. Then iterate the bitset
>> > (also
>> > > > >> > created
>> > > > >> > > in step 2) sending down each doc to the lower collectors,
>> > > retrieving
>> > > > >> and
>> > > > >> > > scaling the score from the score map. If the document is not
>> in
>> > > the
>> > > > >> score
>> > > > >> > > map then send down 0.
>> > > > >> > >
>> > > > >> > > You'll have setup a dummy scorer to feed to lower collectors.
>> > The
>> > > > >> > > CollapsingQParserPlugin has an example of how to do this.
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
>> > > > [hidden email]
>> > > > >> > > >wrote:
>> > > > >> > >
>> > > > >> > > > Hi Joel,
>> > > > >> > > >
>> > > > >> > > > I thought about using a PostFilter, but the problem is that
>> > the
>> > > > >> 'scale'
>> > > > >> > > > function must be done after all matching docs have been
>> scored
>> > > but
>> > > > >> > before
>> > > > >> > > > adding them to the PriorityQueue that sorts just the rows
>> to
>> > be
>> > > > >> > returned.
>> > > > >> > > > Doing the 'scale' function wrapped in a 'query' is proving
>> to
>> > be
>> > > > too
>> > > > >> > slow
>> > > > >> > > > when it visits every document in the index.
>> > > > >> > > >
>> > > > >> > > > In the Collector, I can see how to get the field values
>> like
>> > > this:
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
>> > > > >> > > > QParser).getValues()
>> > > > >> > > >
>> > > > >> > > > But, 'getValueSource' needs a QParser, which isn't
>> available.
>> > > > >> > > > And I can't create a QParser without a SolrQueryRequest,
>> which
>> > > > isn't
>> > > > >> > > > available.
>> > > > >> > > >
>> > > > >> > > > Thanks,
>> > > > >> > > > Peter
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
>> > > > [hidden email]
>> > > > >> >
>> > > > >> > > > wrote:
>> > > > >> > > >
>> > > > >> > > > > Peter,
>> > > > >> > > > >
>> > > > >> > > > > It sounds like you could achieve what you want to do in a
>> > > > >> PostFilter
>> > > > >> > > > rather
>> > > > >> > > > > then extending the TopDocsCollector. Is there a reason
>> why a
>> > > > >> > PostFilter
>> > > > >> > > > > won't work for you?
>> > > > >> > > > >
>> > > > >> > > > > Joel
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
>> > > > >> > [hidden email]
>> > > > >> > > > > >wrote:
>> > > > >> > > > >
>> > > > >> > > > > > Quick question:
>> > > > >> > > > > > In the context of a custom collector, how does one get
>> the
>> > > > >> values
>> > > > >> > of
>> > > > >> > > a
>> > > > >> > > > > > field of type 'ExternalFileField'?
>> > > > >> > > > > >
>> > > > >> > > > > > Thanks,
>> > > > >> > > > > > Peter
>> > > > >> > > > > >
>> > > > >> > > > > >
>> > > > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
>> > > > >> > > [hidden email]
>> > > > >> > > > > > >wrote:
>> > > > >> > > > > >
>> > > > >> > > > > > > Hi Joel,
>> > > > >> > > > > > >
>> > > > >> > > > > > > This is related to another thread on function query
>> > > > matching (
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
>> > > > >> > > > > > ).
>> > > > >> > > > > > > The patch in SOLR-4465 will allow me to extend
>> > > > >> TopDocsCollector
>> > > > >> > and
>> > > > >> > > > > > perform
>> > > > >> > > > > > > the 'scale' function on only the documents matching
>> the
>> > > main
>> > > > >> > dismax
>> > > > >> > > > > > query.
>> > > > >> > > > > > > As you mention, it is a slightly intrusive design and
>> > > > requires
>> > > > >> > > that I
>> > > > >> > > > > > > manage my own PriorityQueue (and a local duplicate of
>> > > > >> HitQueue),
>> > > > >> > > but
>> > > > >> > > > > > should
>> > > > >> > > > > > > work. I think a better design would hide the PQ from
>> the
>> > > > >> plugin.
>> > > > >> > > > > > >
>> > > > >> > > > > > > Thanks,
>> > > > >> > > > > > > Peter
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
>> > > > >> > [hidden email]
>> > > > >> > > >
>> > > > >> > > > > > wrote:
>> > > > >> > > > > > >
>> > > > >> > > > > > >> Hi Peter,
>> > > > >> > > > > > >>
>> > > > >> > > > > > >> I've been meaning to revisit configurable ranking
>> > > > collectors,
>> > > > >> > but
>> > > > >> > > I
>> > > > >> > > > > > >> haven't
>> > > > >> > > > > > >> yet had a chance. It's on the shortlist of things
>> I'd
>> > > like
>> > > > to
>> > > > >> > > tackle
>> > > > >> > > > > > >> though.
>> > > > >> > > > > > >>
>> > > > >> > > > > > >>
>> > > > >> > > > > > >>
>> > > > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
>> > > > >> > > > [hidden email]>
>> > > > >> > > > > > >> wrote:
>> > > > >> > > > > > >>
>> > > > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it
>> appears
>> > > > that
>> > > > >> > there
>> > > > >> > > > is
>> > > > >> > > > > a
>> > > > >> > > > > > >> goal
>> > > > >> > > > > > >> > to be able to do custom sorting and ranking in a
>> > > > >> PostFilter.
>> > > > >> > So
>> > > > >> > > > far,
>> > > > >> > > > > > it
>> > > > >> > > > > > >> > looks like only custom aggregation can be
>> implemented
>> > > in
>> > > > >> > > > PostFilter
>> > > > >> > > > > > >> (5045).
>> > > > >> > > > > > >> > Custom sorting/ranking can be done in a pluggable
>> > > > collector
>> > > > >> > > > (4465),
>> > > > >> > > > > > but
>> > > > >> > > > > > >> > this patch is no longer in dev.
>> > > > >> > > > > > >> >
>> > > > >> > > > > > >> > Is there any other dev. being done on adding
>> custom
>> > > > sorting
>> > > > >> > > (after
>> > > > >> > > > > > >> > collection) via a plugin?
>> > > > >> > > > > > >> >
>> > > > >> > > > > > >> > Thanks,
>> > > > >> > > > > > >> > Peter
>> > > > >> > > > > > >> >
>> > > > >> > > > > > >>
>> > > > >> > > > > > >>
>> > > > >> > > > > > >>
>> > > > >> > > > > > >> --
>> > > > >> > > > > > >> Joel Bernstein
>> > > > >> > > > > > >> Search Engineer at Heliosearch
>> > > > >> > > > > > >>
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > --
>> > > > >> > > > > Joel Bernstein
>> > > > >> > > > > Search Engineer at Heliosearch
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > --
>> > > > >> > > Joel Bernstein
>> > > > >> > > Search Engineer at Heliosearch
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> Joel Bernstein
>> > > > >> Search Engineer at Heliosearch
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Joel Bernstein
>> > > Search Engineer at Heliosearch
>> > >
>> >
>>
>>
>>
>> --
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Joel Bernstein
Hi Peter,

The fastest approach to doing this would to keep parallel hppc
FloatArrayList for the scores and IntArrayList for the docs. Just add the
docs and scores at collect time and iterate them in the finish. You'll be
using more memory, but if you're looking for best possible performance then
this might be the way to go.

Joel


On Thu, Dec 19, 2013 at 3:25 PM, Peter Keegan <[hidden email]>wrote:

> I implemented the PostFilter approach described by Joel. Just iterating
> over the OpenBitSet, even without the scaling or the HashMap lookup, added
> 30ms to a query time, which kinda surprised me. There were about 150K hits
> out of a total of 500K. Is OpenBitSet the best way to do this?
>
> Thanks,
> Peter
>
>
> On Thu, Dec 19, 2013 at 9:51 AM, Peter Keegan <[hidden email]
> >wrote:
>
> > In order to size the PriorityQueue, the result window size for the query
> > is needed. This has been computed in the SolrIndexSearcher and available
> > in: QueryCommand.getSupersetMaxDoc(), but doesn't seem to be available
> for
> > the PostFilter in either the SolrParms or SolrQueryRequest. Is there a
> way
> > to get this precomputed value or do I have to duplicate the logic from
> > SolrIndexSearcher?
> >
> > Thanks,
> > Peter
> >
> >
> > On Thu, Dec 12, 2013 at 1:53 PM, Joel Bernstein <[hidden email]
> >wrote:
> >
> >> Thanks, I agree this powerful stuff. One of the reasons that I haven't
> >> gotten back to pluggable collectors is that I've been using PostFilters
> >> instead.
> >>
> >> When you start doing stuff with scores in postfilters you'll run into
> the
> >> bug in SOLR-5416. This will effect you when you use facets in
> combination
> >> with the QueryResultCache or tag and exclude faceting.
> >>
> >> The patch in SOLR-5416 resolves this issue. You'll just need your
> >> PostFilter to implement ScoreFilter and the SolrIndexSearcher will know
> >> how
> >> to handle things.
> >>
> >> The DelegatingCollector.finish() method is so new, these kinds of bugs
> are
> >> still being cleaned out of the system. SOLR-5416 should be in Solr 4.7.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Dec 12, 2013 at 12:54 PM, Peter Keegan <[hidden email]
> >> >wrote:
> >>
> >> > This is pretty cool, and worthy of adding to Solr in Action (v2) and
> the
> >> > other books. With function queries, flexible filter processing and
> >> caching,
> >> > custom collectors, and post filters, there's a lot of flexibility
> here.
> >> >
> >> > Btw, the query times using a custom collector to scale/recompute
> scores
> >> is
> >> > excellent (will have to see how it compares to your outlined
> solution).
> >> >
> >> > Thanks,
> >> > Peter
> >> >
> >> >
> >> > On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <[hidden email]>
> >> > wrote:
> >> >
> >> > > The sorting is going to happen in the lower level collectors. You
> >> need a
> >> > > value source that returns the score of the document being collected.
> >> > >
> >> > > Here is how you can make this happen:
> >> > >
> >> > > 1) Create an object in your PostFilter that simply holds the current
> >> > score.
> >> > > Place this object in the SearchRequest context map. Update
> >> object.score
> >> > as
> >> > > you pass the docs and scores to the lower collectors.
> >> > >
> >> > > 2) Create a values source that checks the SearchRequest context for
> >> the
> >> > > object that's holding the current score. Use this object to return
> the
> >> > > current score when called. For example if you give the value source
> a
> >> > > handle called "score" a compound function call will look like this:
> >> > > sum(score(), field(x))
> >> > >
> >> > > Joel
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <
> [hidden email]
> >> > > >wrote:
> >> > >
> >> > > > Regarding my original goal, which is to perform a math function
> >> using
> >> > the
> >> > > > scaled score and a field value, and sort on the result, how does
> >> this
> >> > fit
> >> > > > in? Must I implement another custom PostFilter with a higher cost
> >> than
> >> > > the
> >> > > > scale PostFilter?
> >> > > >
> >> > > > Thanks,
> >> > > > Peter
> >> > > >
> >> > > >
> >> > > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <
> >> [hidden email]
> >> > > > >wrote:
> >> > > >
> >> > > > > Thanks very much for the guidance. I'd be happy to donate a
> >> working
> >> > > > > solution.
> >> > > > >
> >> > > > > Peter
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <
> >> [hidden email]
> >> > > > >wrote:
> >> > > > >
> >> > > > >> SOLR-5020 has the commit info, it's mainly changes to
> >> > > SolrIndexSearcher
> >> > > > I
> >> > > > >> believe. They might apply to 4.3.
> >> > > > >> I think as long you have the finish method that's all you'll
> >> need.
> >> > If
> >> > > > you
> >> > > > >> can get this working it would be excellent if you could donate
> >> back
> >> > > the
> >> > > > >> Scale PostFilter.
> >> > > > >>
> >> > > > >>
> >> > > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <
> >> > [hidden email]
> >> > > > >> >wrote:
> >> > > > >>
> >> > > > >> > This is what I was looking for, but the DelegatingCollector
> >> > 'finish'
> >> > > > >> method
> >> > > > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are
> >> there
> >> > any
> >> > > > >> other
> >> > > > >> > PostFilter dependencies on 4.5?
> >> > > > >> >
> >> > > > >> > Thanks,
> >> > > > >> > Peter
> >> > > > >> >
> >> > > > >> >
> >> > > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <
> >> > [hidden email]
> >> > > >
> >> > > > >> > wrote:
> >> > > > >> >
> >> > > > >> > > Here is one approach to use in a postfilter
> >> > > > >> > >
> >> > > > >> > > 1) In the collect() method call score for each doc. Use the
> >> > scores
> >> > > > to
> >> > > > >> > > create your scaleInfo.
> >> > > > >> > > 2) Keep a bitset of the hits and a priorityQueue of your
> top
> >> X
> >> > > > >> ScoreDocs.
> >> > > > >> > > 3) Don't delegate any documents to lower collectors in the
> >> > > collect()
> >> > > > >> > > method.
> >> > > > >> > > 4) In the finish method create a score mapping (use the
> hppc
> >> > > > >> > > IntFloatOpenHashMap) with your top X docIds pointing to
> their
> >> > > score,
> >> > > > >> > using
> >> > > > >> > > the priorityQueue created in step 2. Then iterate the
> bitset
> >> > (also
> >> > > > >> > created
> >> > > > >> > > in step 2) sending down each doc to the lower collectors,
> >> > > retrieving
> >> > > > >> and
> >> > > > >> > > scaling the score from the score map. If the document is
> not
> >> in
> >> > > the
> >> > > > >> score
> >> > > > >> > > map then send down 0.
> >> > > > >> > >
> >> > > > >> > > You'll have setup a dummy scorer to feed to lower
> collectors.
> >> > The
> >> > > > >> > > CollapsingQParserPlugin has an example of how to do this.
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
> >> > > > [hidden email]
> >> > > > >> > > >wrote:
> >> > > > >> > >
> >> > > > >> > > > Hi Joel,
> >> > > > >> > > >
> >> > > > >> > > > I thought about using a PostFilter, but the problem is
> that
> >> > the
> >> > > > >> 'scale'
> >> > > > >> > > > function must be done after all matching docs have been
> >> scored
> >> > > but
> >> > > > >> > before
> >> > > > >> > > > adding them to the PriorityQueue that sorts just the rows
> >> to
> >> > be
> >> > > > >> > returned.
> >> > > > >> > > > Doing the 'scale' function wrapped in a 'query' is
> proving
> >> to
> >> > be
> >> > > > too
> >> > > > >> > slow
> >> > > > >> > > > when it visits every document in the index.
> >> > > > >> > > >
> >> > > > >> > > > In the Collector, I can see how to get the field values
> >> like
> >> > > this:
> >> > > > >> > > >
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> >> > > > >> > > > QParser).getValues()
> >> > > > >> > > >
> >> > > > >> > > > But, 'getValueSource' needs a QParser, which isn't
> >> available.
> >> > > > >> > > > And I can't create a QParser without a SolrQueryRequest,
> >> which
> >> > > > isn't
> >> > > > >> > > > available.
> >> > > > >> > > >
> >> > > > >> > > > Thanks,
> >> > > > >> > > > Peter
> >> > > > >> > > >
> >> > > > >> > > >
> >> > > > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
> >> > > > [hidden email]
> >> > > > >> >
> >> > > > >> > > > wrote:
> >> > > > >> > > >
> >> > > > >> > > > > Peter,
> >> > > > >> > > > >
> >> > > > >> > > > > It sounds like you could achieve what you want to do
> in a
> >> > > > >> PostFilter
> >> > > > >> > > > rather
> >> > > > >> > > > > then extending the TopDocsCollector. Is there a reason
> >> why a
> >> > > > >> > PostFilter
> >> > > > >> > > > > won't work for you?
> >> > > > >> > > > >
> >> > > > >> > > > > Joel
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
> >> > > > >> > [hidden email]
> >> > > > >> > > > > >wrote:
> >> > > > >> > > > >
> >> > > > >> > > > > > Quick question:
> >> > > > >> > > > > > In the context of a custom collector, how does one
> get
> >> the
> >> > > > >> values
> >> > > > >> > of
> >> > > > >> > > a
> >> > > > >> > > > > > field of type 'ExternalFileField'?
> >> > > > >> > > > > >
> >> > > > >> > > > > > Thanks,
> >> > > > >> > > > > > Peter
> >> > > > >> > > > > >
> >> > > > >> > > > > >
> >> > > > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> >> > > > >> > > [hidden email]
> >> > > > >> > > > > > >wrote:
> >> > > > >> > > > > >
> >> > > > >> > > > > > > Hi Joel,
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > This is related to another thread on function query
> >> > > > matching (
> >> > > > >> > > > > > >
> >> > > > >> > > > > >
> >> > > > >> > > > >
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> >> > > > >> > > > > > ).
> >> > > > >> > > > > > > The patch in SOLR-4465 will allow me to extend
> >> > > > >> TopDocsCollector
> >> > > > >> > and
> >> > > > >> > > > > > perform
> >> > > > >> > > > > > > the 'scale' function on only the documents matching
> >> the
> >> > > main
> >> > > > >> > dismax
> >> > > > >> > > > > > query.
> >> > > > >> > > > > > > As you mention, it is a slightly intrusive design
> and
> >> > > > requires
> >> > > > >> > > that I
> >> > > > >> > > > > > > manage my own PriorityQueue (and a local duplicate
> of
> >> > > > >> HitQueue),
> >> > > > >> > > but
> >> > > > >> > > > > > should
> >> > > > >> > > > > > > work. I think a better design would hide the PQ
> from
> >> the
> >> > > > >> plugin.
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > Thanks,
> >> > > > >> > > > > > > Peter
> >> > > > >> > > > > > >
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
> >> > > > >> > [hidden email]
> >> > > > >> > > >
> >> > > > >> > > > > > wrote:
> >> > > > >> > > > > > >
> >> > > > >> > > > > > >> Hi Peter,
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >> I've been meaning to revisit configurable ranking
> >> > > > collectors,
> >> > > > >> > but
> >> > > > >> > > I
> >> > > > >> > > > > > >> haven't
> >> > > > >> > > > > > >> yet had a chance. It's on the shortlist of things
> >> I'd
> >> > > like
> >> > > > to
> >> > > > >> > > tackle
> >> > > > >> > > > > > >> though.
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> >> > > > >> > > > [hidden email]>
> >> > > > >> > > > > > >> wrote:
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it
> >> appears
> >> > > > that
> >> > > > >> > there
> >> > > > >> > > > is
> >> > > > >> > > > > a
> >> > > > >> > > > > > >> goal
> >> > > > >> > > > > > >> > to be able to do custom sorting and ranking in a
> >> > > > >> PostFilter.
> >> > > > >> > So
> >> > > > >> > > > far,
> >> > > > >> > > > > > it
> >> > > > >> > > > > > >> > looks like only custom aggregation can be
> >> implemented
> >> > > in
> >> > > > >> > > > PostFilter
> >> > > > >> > > > > > >> (5045).
> >> > > > >> > > > > > >> > Custom sorting/ranking can be done in a
> pluggable
> >> > > > collector
> >> > > > >> > > > (4465),
> >> > > > >> > > > > > but
> >> > > > >> > > > > > >> > this patch is no longer in dev.
> >> > > > >> > > > > > >> >
> >> > > > >> > > > > > >> > Is there any other dev. being done on adding
> >> custom
> >> > > > sorting
> >> > > > >> > > (after
> >> > > > >> > > > > > >> > collection) via a plugin?
> >> > > > >> > > > > > >> >
> >> > > > >> > > > > > >> > Thanks,
> >> > > > >> > > > > > >> > Peter
> >> > > > >> > > > > > >> >
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >> --
> >> > > > >> > > > > > >> Joel Bernstein
> >> > > > >> > > > > > >> Search Engineer at Heliosearch
> >> > > > >> > > > > > >>
> >> > > > >> > > > > > >
> >> > > > >> > > > > > >
> >> > > > >> > > > > >
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > > --
> >> > > > >> > > > > Joel Bernstein
> >> > > > >> > > > > Search Engineer at Heliosearch
> >> > > > >> > > > >
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > > --
> >> > > > >> > > Joel Bernstein
> >> > > > >> > > Search Engineer at Heliosearch
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >> --
> >> > > > >> Joel Bernstein
> >> > > > >> Search Engineer at Heliosearch
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Joel Bernstein
> >> > > Search Engineer at Heliosearch
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Joel Bernstein
> >> Search Engineer at Heliosearch
> >>
> >
> >
>



--
Joel Bernstein
Search Engineer at Heliosearch
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Peter Keegan
In reply to this post by Joel Bernstein
Hi Joel,

Could you clarify what would be in the key,value Map added to the
SearchRequest context? It seems that all the docId/score tuples need to be
there, including the ones not in the 'top N ScoreDocs' PriorityQueue
(score=0). If so would the Map be something like:
"scaled_scores",Map<Integer,Float> ?

Also, what is the reason for passing score=0 for documents that aren't in
the PriorityQueue? Will these docs get filtered out before a normal sort by
score?

Thanks,
Peter


On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <[hidden email]> wrote:

> The sorting is going to happen in the lower level collectors. You need a
> value source that returns the score of the document being collected.
>
> Here is how you can make this happen:
>
> 1) Create an object in your PostFilter that simply holds the current score.
> Place this object in the SearchRequest context map. Update object.score as
> you pass the docs and scores to the lower collectors.
>
> 2) Create a values source that checks the SearchRequest context for the
> object that's holding the current score. Use this object to return the
> current score when called. For example if you give the value source a
> handle called "score" a compound function call will look like this:
> sum(score(), field(x))
>
> Joel
>
>
>
>
>
>
>
>
>
>
> On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <[hidden email]
> >wrote:
>
> > Regarding my original goal, which is to perform a math function using the
> > scaled score and a field value, and sort on the result, how does this fit
> > in? Must I implement another custom PostFilter with a higher cost than
> the
> > scale PostFilter?
> >
> > Thanks,
> > Peter
> >
> >
> > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <[hidden email]
> > >wrote:
> >
> > > Thanks very much for the guidance. I'd be happy to donate a working
> > > solution.
> > >
> > > Peter
> > >
> > >
> > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <[hidden email]
> > >wrote:
> > >
> > >> SOLR-5020 has the commit info, it's mainly changes to
> SolrIndexSearcher
> > I
> > >> believe. They might apply to 4.3.
> > >> I think as long you have the finish method that's all you'll need. If
> > you
> > >> can get this working it would be excellent if you could donate back
> the
> > >> Scale PostFilter.
> > >>
> > >>
> > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <[hidden email]
> > >> >wrote:
> > >>
> > >> > This is what I was looking for, but the DelegatingCollector 'finish'
> > >> method
> > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are there any
> > >> other
> > >> > PostFilter dependencies on 4.5?
> > >> >
> > >> > Thanks,
> > >> > Peter
> > >> >
> > >> >
> > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <[hidden email]
> >
> > >> > wrote:
> > >> >
> > >> > > Here is one approach to use in a postfilter
> > >> > >
> > >> > > 1) In the collect() method call score for each doc. Use the scores
> > to
> > >> > > create your scaleInfo.
> > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
> > >> ScoreDocs.
> > >> > > 3) Don't delegate any documents to lower collectors in the
> collect()
> > >> > > method.
> > >> > > 4) In the finish method create a score mapping (use the hppc
> > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their
> score,
> > >> > using
> > >> > > the priorityQueue created in step 2. Then iterate the bitset (also
> > >> > created
> > >> > > in step 2) sending down each doc to the lower collectors,
> retrieving
> > >> and
> > >> > > scaling the score from the score map. If the document is not in
> the
> > >> score
> > >> > > map then send down 0.
> > >> > >
> > >> > > You'll have setup a dummy scorer to feed to lower collectors. The
> > >> > > CollapsingQParserPlugin has an example of how to do this.
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
> > [hidden email]
> > >> > > >wrote:
> > >> > >
> > >> > > > Hi Joel,
> > >> > > >
> > >> > > > I thought about using a PostFilter, but the problem is that the
> > >> 'scale'
> > >> > > > function must be done after all matching docs have been scored
> but
> > >> > before
> > >> > > > adding them to the PriorityQueue that sorts just the rows to be
> > >> > returned.
> > >> > > > Doing the 'scale' function wrapped in a 'query' is proving to be
> > too
> > >> > slow
> > >> > > > when it visits every document in the index.
> > >> > > >
> > >> > > > In the Collector, I can see how to get the field values like
> this:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> > >> > > > QParser).getValues()
> > >> > > >
> > >> > > > But, 'getValueSource' needs a QParser, which isn't available.
> > >> > > > And I can't create a QParser without a SolrQueryRequest, which
> > isn't
> > >> > > > available.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Peter
> > >> > > >
> > >> > > >
> > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
> > [hidden email]
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Peter,
> > >> > > > >
> > >> > > > > It sounds like you could achieve what you want to do in a
> > >> PostFilter
> > >> > > > rather
> > >> > > > > then extending the TopDocsCollector. Is there a reason why a
> > >> > PostFilter
> > >> > > > > won't work for you?
> > >> > > > >
> > >> > > > > Joel
> > >> > > > >
> > >> > > > >
> > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
> > >> > [hidden email]
> > >> > > > > >wrote:
> > >> > > > >
> > >> > > > > > Quick question:
> > >> > > > > > In the context of a custom collector, how does one get the
> > >> values
> > >> > of
> > >> > > a
> > >> > > > > > field of type 'ExternalFileField'?
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > > Peter
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> > >> > > [hidden email]
> > >> > > > > > >wrote:
> > >> > > > > >
> > >> > > > > > > Hi Joel,
> > >> > > > > > >
> > >> > > > > > > This is related to another thread on function query
> > matching (
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > >> > > > > > ).
> > >> > > > > > > The patch in SOLR-4465 will allow me to extend
> > >> TopDocsCollector
> > >> > and
> > >> > > > > > perform
> > >> > > > > > > the 'scale' function on only the documents matching the
> main
> > >> > dismax
> > >> > > > > > query.
> > >> > > > > > > As you mention, it is a slightly intrusive design and
> > requires
> > >> > > that I
> > >> > > > > > > manage my own PriorityQueue (and a local duplicate of
> > >> HitQueue),
> > >> > > but
> > >> > > > > > should
> > >> > > > > > > work. I think a better design would hide the PQ from the
> > >> plugin.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > > Peter
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
> > >> > [hidden email]
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > >> Hi Peter,
> > >> > > > > > >>
> > >> > > > > > >> I've been meaning to revisit configurable ranking
> > collectors,
> > >> > but
> > >> > > I
> > >> > > > > > >> haven't
> > >> > > > > > >> yet had a chance. It's on the shortlist of things I'd
> like
> > to
> > >> > > tackle
> > >> > > > > > >> though.
> > >> > > > > > >>
> > >> > > > > > >>
> > >> > > > > > >>
> > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> > >> > > > [hidden email]>
> > >> > > > > > >> wrote:
> > >> > > > > > >>
> > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it appears
> > that
> > >> > there
> > >> > > > is
> > >> > > > > a
> > >> > > > > > >> goal
> > >> > > > > > >> > to be able to do custom sorting and ranking in a
> > >> PostFilter.
> > >> > So
> > >> > > > far,
> > >> > > > > > it
> > >> > > > > > >> > looks like only custom aggregation can be implemented
> in
> > >> > > > PostFilter
> > >> > > > > > >> (5045).
> > >> > > > > > >> > Custom sorting/ranking can be done in a pluggable
> > collector
> > >> > > > (4465),
> > >> > > > > > but
> > >> > > > > > >> > this patch is no longer in dev.
> > >> > > > > > >> >
> > >> > > > > > >> > Is there any other dev. being done on adding custom
> > sorting
> > >> > > (after
> > >> > > > > > >> > collection) via a plugin?
> > >> > > > > > >> >
> > >> > > > > > >> > Thanks,
> > >> > > > > > >> > Peter
> > >> > > > > > >> >
> > >> > > > > > >>
> > >> > > > > > >>
> > >> > > > > > >>
> > >> > > > > > >> --
> > >> > > > > > >> Joel Bernstein
> > >> > > > > > >> Search Engineer at Heliosearch
> > >> > > > > > >>
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Joel Bernstein
> > >> > > > > Search Engineer at Heliosearch
> > >> > > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Joel Bernstein
> > >> > > Search Engineer at Heliosearch
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Joel Bernstein
> > >> Search Engineer at Heliosearch
> > >>
> > >
> > >
> >
>
>
>
> --
> Joel Bernstein
> Search Engineer at Heliosearch
>
Reply | Threaded
Open this post in threaded view
|

Re: Configurable collectors for custom ranking

Joel Bernstein
Peter,

You actually only need the current score being collected to be in the
request context. So you don't need a map, you just need an object wrapper
around a mutable float.

If you have a page size of X, only the top X scores need to be held onto,
because all the other scores wouldn't have made it into that page anyway so
they might as well be 0. Because the QueryResultCache caches's a larger
window then the page size you should keep enough scores so the cached
docList is correct. But if you're only dealing with 150K of results you
could just keep all the scores in a FloatArrayList and not worry about the
keeping the top X scores in a priority queue.

During the collect hang onto the docIds and scores and build your scaling
info.

During the finish iterate your docIds and scale the scores as you go.

Set your scaled score into the object wrapper that is in the request
context before you collect each document.

When you call collect on the delegate collectors they will call the custom
value source for each document to perform the sort. Your custom value
source will return whatever the float value is in the request context at
that time.

If you're also going to run this postfilter when you're doing a standard
rank by score you'll also need to send down a dummy scorer to the delegate
collectors. Spend some time with the CollapsingQParserPlugin in trunk to
see how the dummy scorer works.

I'll be adding value source collapse criteria to the
CollapsingQParserPlugin this week and it will have a similar interaction
between a PostFilter and value source. So you may want to watch SOLR-5536
to see an example of this.

Joel












Joel Bernstein
Search Engineer at Heliosearch


On Mon, Dec 23, 2013 at 4:03 PM, Peter Keegan <[hidden email]>wrote:

> Hi Joel,
>
> Could you clarify what would be in the key,value Map added to the
> SearchRequest context? It seems that all the docId/score tuples need to be
> there, including the ones not in the 'top N ScoreDocs' PriorityQueue
> (score=0). If so would the Map be something like:
> "scaled_scores",Map<Integer,Float> ?
>
> Also, what is the reason for passing score=0 for documents that aren't in
> the PriorityQueue? Will these docs get filtered out before a normal sort by
> score?
>
> Thanks,
> Peter
>
>
> On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <[hidden email]>
> wrote:
>
> > The sorting is going to happen in the lower level collectors. You need a
> > value source that returns the score of the document being collected.
> >
> > Here is how you can make this happen:
> >
> > 1) Create an object in your PostFilter that simply holds the current
> score.
> > Place this object in the SearchRequest context map. Update object.score
> as
> > you pass the docs and scores to the lower collectors.
> >
> > 2) Create a values source that checks the SearchRequest context for the
> > object that's holding the current score. Use this object to return the
> > current score when called. For example if you give the value source a
> > handle called "score" a compound function call will look like this:
> > sum(score(), field(x))
> >
> > Joel
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <[hidden email]
> > >wrote:
> >
> > > Regarding my original goal, which is to perform a math function using
> the
> > > scaled score and a field value, and sort on the result, how does this
> fit
> > > in? Must I implement another custom PostFilter with a higher cost than
> > the
> > > scale PostFilter?
> > >
> > > Thanks,
> > > Peter
> > >
> > >
> > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <[hidden email]
> > > >wrote:
> > >
> > > > Thanks very much for the guidance. I'd be happy to donate a working
> > > > solution.
> > > >
> > > > Peter
> > > >
> > > >
> > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <[hidden email]
> > > >wrote:
> > > >
> > > >> SOLR-5020 has the commit info, it's mainly changes to
> > SolrIndexSearcher
> > > I
> > > >> believe. They might apply to 4.3.
> > > >> I think as long you have the finish method that's all you'll need.
> If
> > > you
> > > >> can get this working it would be excellent if you could donate back
> > the
> > > >> Scale PostFilter.
> > > >>
> > > >>
> > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <
> [hidden email]
> > > >> >wrote:
> > > >>
> > > >> > This is what I was looking for, but the DelegatingCollector
> 'finish'
> > > >> method
> > > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are there
> any
> > > >> other
> > > >> > PostFilter dependencies on 4.5?
> > > >> >
> > > >> > Thanks,
> > > >> > Peter
> > > >> >
> > > >> >
> > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <
> [hidden email]
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Here is one approach to use in a postfilter
> > > >> > >
> > > >> > > 1) In the collect() method call score for each doc. Use the
> scores
> > > to
> > > >> > > create your scaleInfo.
> > > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
> > > >> ScoreDocs.
> > > >> > > 3) Don't delegate any documents to lower collectors in the
> > collect()
> > > >> > > method.
> > > >> > > 4) In the finish method create a score mapping (use the hppc
> > > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their
> > score,
> > > >> > using
> > > >> > > the priorityQueue created in step 2. Then iterate the bitset
> (also
> > > >> > created
> > > >> > > in step 2) sending down each doc to the lower collectors,
> > retrieving
> > > >> and
> > > >> > > scaling the score from the score map. If the document is not in
> > the
> > > >> score
> > > >> > > map then send down 0.
> > > >> > >
> > > >> > > You'll have setup a dummy scorer to feed to lower collectors.
> The
> > > >> > > CollapsingQParserPlugin has an example of how to do this.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
> > > [hidden email]
> > > >> > > >wrote:
> > > >> > >
> > > >> > > > Hi Joel,
> > > >> > > >
> > > >> > > > I thought about using a PostFilter, but the problem is that
> the
> > > >> 'scale'
> > > >> > > > function must be done after all matching docs have been scored
> > but
> > > >> > before
> > > >> > > > adding them to the PriorityQueue that sorts just the rows to
> be
> > > >> > returned.
> > > >> > > > Doing the 'scale' function wrapped in a 'query' is proving to
> be
> > > too
> > > >> > slow
> > > >> > > > when it visits every document in the index.
> > > >> > > >
> > > >> > > > In the Collector, I can see how to get the field values like
> > this:
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> > > >> > > > QParser).getValues()
> > > >> > > >
> > > >> > > > But, 'getValueSource' needs a QParser, which isn't available.
> > > >> > > > And I can't create a QParser without a SolrQueryRequest, which
> > > isn't
> > > >> > > > available.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Peter
> > > >> > > >
> > > >> > > >
> > > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
> > > [hidden email]
> > > >> >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Peter,
> > > >> > > > >
> > > >> > > > > It sounds like you could achieve what you want to do in a
> > > >> PostFilter
> > > >> > > > rather
> > > >> > > > > then extending the TopDocsCollector. Is there a reason why a
> > > >> > PostFilter
> > > >> > > > > won't work for you?
> > > >> > > > >
> > > >> > > > > Joel
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
> > > >> > [hidden email]
> > > >> > > > > >wrote:
> > > >> > > > >
> > > >> > > > > > Quick question:
> > > >> > > > > > In the context of a custom collector, how does one get the
> > > >> values
> > > >> > of
> > > >> > > a
> > > >> > > > > > field of type 'ExternalFileField'?
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Peter
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> > > >> > > [hidden email]
> > > >> > > > > > >wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Joel,
> > > >> > > > > > >
> > > >> > > > > > > This is related to another thread on function query
> > > matching (
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > > >> > > > > > ).
> > > >> > > > > > > The patch in SOLR-4465 will allow me to extend
> > > >> TopDocsCollector
> > > >> > and
> > > >> > > > > > perform
> > > >> > > > > > > the 'scale' function on only the documents matching the
> > main
> > > >> > dismax
> > > >> > > > > > query.
> > > >> > > > > > > As you mention, it is a slightly intrusive design and
> > > requires
> > > >> > > that I
> > > >> > > > > > > manage my own PriorityQueue (and a local duplicate of
> > > >> HitQueue),
> > > >> > > but
> > > >> > > > > > should
> > > >> > > > > > > work. I think a better design would hide the PQ from the
> > > >> plugin.
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > > Peter
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
> > > >> > [hidden email]
> > > >> > > >
> > > >> > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > >> Hi Peter,
> > > >> > > > > > >>
> > > >> > > > > > >> I've been meaning to revisit configurable ranking
> > > collectors,
> > > >> > but
> > > >> > > I
> > > >> > > > > > >> haven't
> > > >> > > > > > >> yet had a chance. It's on the shortlist of things I'd
> > like
> > > to
> > > >> > > tackle
> > > >> > > > > > >> though.
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> > > >> > > > [hidden email]>
> > > >> > > > > > >> wrote:
> > > >> > > > > > >>
> > > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it appears
> > > that
> > > >> > there
> > > >> > > > is
> > > >> > > > > a
> > > >> > > > > > >> goal
> > > >> > > > > > >> > to be able to do custom sorting and ranking in a
> > > >> PostFilter.
> > > >> > So
> > > >> > > > far,
> > > >> > > > > > it
> > > >> > > > > > >> > looks like only custom aggregation can be implemented
> > in
> > > >> > > > PostFilter
> > > >> > > > > > >> (5045).
> > > >> > > > > > >> > Custom sorting/ranking can be done in a pluggable
> > > collector
> > > >> > > > (4465),
> > > >> > > > > > but
> > > >> > > > > > >> > this patch is no longer in dev.
> > > >> > > > > > >> >
> > > >> > > > > > >> > Is there any other dev. being done on adding custom
> > > sorting
> > > >> > > (after
> > > >> > > > > > >> > collection) via a plugin?
> > > >> > > > > > >> >
> > > >> > > > > > >> > Thanks,
> > > >> > > > > > >> > Peter
> > > >> > > > > > >> >
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >> --
> > > >> > > > > > >> Joel Bernstein
> > > >> > > > > > >> Search Engineer at Heliosearch
> > > >> > > > > > >>
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Joel Bernstein
> > > >> > > > > Search Engineer at Heliosearch
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Joel Bernstein
> > > >> > > Search Engineer at Heliosearch
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Joel Bernstein
> > > >> Search Engineer at Heliosearch
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
>
12