Heavy operations in PostFilter are heavy

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Heavy operations in PostFilter are heavy

Solrmails
Hello,

I tried to write a Solr PostFilter to do filtering within the 'collect'-Method(DelegatingCollector). I have to do some heavy operations within the 'collect'-Method. This isn't a problem for a few results. But unfortunately it taks forever with 50 or more results. This is because I have to do the checks for every single id again and can't process a list of ids within 'collect'.

Is there a better place to do PostFiltering? But I don't want to reimplement the Solr Paging/Coursor-Feature to get my things to work.

Thank You
Reply | Threaded
Open this post in threaded view
|

Re: Heavy operations in PostFilter are heavy

Alexandre Rafalovitch
Are you doing cache=false and cost > 100?

See the recent article on the topic deep-dive, if you haven't:
https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/

Regards,
   Alex.

On 3 January 2018 at 05:31, Solrmails <[hidden email]> wrote:
> Hello,
>
> I tried to write a Solr PostFilter to do filtering within the 'collect'-Method(DelegatingCollector). I have to do some heavy operations within the 'collect'-Method. This isn't a problem for a few results. But unfortunately it taks forever with 50 or more results. This is because I have to do the checks for every single id again and can't process a list of ids within 'collect'.
>
> Is there a better place to do PostFiltering? But I don't want to reimplement the Solr Paging/Coursor-Feature to get my things to work.
>
> Thank You
Reply | Threaded
Open this post in threaded view
|

Re: Heavy operations in PostFilter are heavy

Solrmails
Yes I do so. The Problem ist that the collect-Method is called for EVERY document the query matches. Even if the User only wants to see like 10 documents. The Operation I have to perform takes maybe 50ms/per document if have to process them singel. And maybe 30ms if I could get a Document-List. But if the user e.g. uses an Wildcard query that matches maybe 100000 Documents even 25ms are much to long.

I can't speedup my code anymore. Is there an other good place to do my checks? I tried to remove the documents later but I don't know how to fetch more documents after removing them on a later step. (Otherwise I would return maybe only 5 or zero documents even if the user wants 10 and there are more documents available)

Sent with [ProtonMail](https://protonmail.com) Secure Email.

> -------- Original Message --------
> Subject: Re: Heavy operations in PostFilter are heavy
> Local Time: 3 January 2018 4:08 PM
> UTC Time: 3 January 2018 15:08
> From: [hidden email]
> To: solr-user <[hidden email]>, Solrmails <[hidden email]>
>
> Are you doing cache=false and cost > 100?
>
> See the recent article on the topic deep-dive, if you haven't:
> https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/
>
> Regards,
> Alex.
>
> On 3 January 2018 at 05:31, Solrmails [hidden email] wrote:
>
>> Hello,
>> I tried to write a Solr PostFilter to do filtering within the 'collect'-Method(DelegatingCollector). I have to do some heavy operations within the 'collect'-Method. This isn't a problem for a few results. But unfortunately it taks forever with 50 or more results. This is because I have to do the checks for every single id again and can't process a list of ids within 'collect'.
>> Is there a better place to do PostFiltering? But I don't want to reimplement the Solr Paging/Coursor-Feature to get my things to work.
>> Thank You
Reply | Threaded
Open this post in threaded view
|

Re: Heavy operations in PostFilter are heavy

Chris Hostetter-3

: Yes I do so. The Problem ist that the collect-Method is called for EVERY
: document the query matches. Even if the User only wants to see like 10
: documents. The Operation I have to perform takes maybe 50ms/per document

You running into a classic chicken/egg problem with document collection
& filtering -- you don't want your expensive filte to be run against every
doc that matches the query (and lower cost filters) just the "top 10" the
user is going to see -- but solr doesn't know what those top 10 are yet,
not untill it's collected & sorted all of them ... nad your PostFilter
can change what gets collected ... it's a filter!

Also: Things like Faceting (and even just returning an accurate numFound!)
require that all matches be "collected" ... unless you are useing sorted
segments and early termintation, your PostFilter has to be consulted about
every (potential) match in order for the results to be accurate.

: if have to process them singel. And maybe 30ms if I could get a
: Document-List. But if the user e.g. uses an Wildcard query that matches

If processing in batch is a viable option then, one approach you may want
to consider is to take the approach used by the CollapseQParser and the
PostFilter it generates -- it doesn't pass on any collected documents to
it's delegate as it collects them -- it essentially just batches them all
up, and then in the "finish" method it processes them and calls
delegate.collect() on the ones it decies are important.

-Hoss
http://www.lucidworks.com/