per-field count of documents matched?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

per-field count of documents matched?

Fischer, Stephen
Hi wise Solr experts,

For our scientific use-case we want to show users a per-field count of documents that match that field.

We like to do this efficiently because we might return up to a million documents.

For example, if we had documents describing People, and a query of, say, "Stone" we might want to show

Fields matched:
  Last name:  145
  Street: 431
  Favorite rock band:  13
  Home exterior: 2340

Is there an efficient way to do this?

So far, we're trying to leverage highlighting.   But it seems very slow.

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: per-field count of documents matched?

Erick Erickson
Hmmm, you could do a facet query (or a series of them). facet.query=LastName:stone&facet.query=Street:stone etc….. That’d automatically only tally for the docs that match.

You could also consider a custom search component. For the exact case you describe, it’s actually fairly simple. The postings list has, for each term, the list of docs that contain it (internal Lucene doc ID). So I might have
for field LastName:
stone -> 1,73,100…

for field Street:
stone-> 264,933…

So it’s simply a matter of, for each term, and each doc the overall query matches go down the list of docs and add them up.

However… I’m not sure you’d get what you want in either case. Consider a query (A AND B) OR (C AND D). And let’s say doc1 contains A in LastName, and C,D in Street. Should A be counted in the LastName tally for this doc?

I suppose you could put the full query in the facet.query above. I’m still not sure it’s what you need, since I’m not sure what "per-field count of documents that match” means in your application…

Best,
Erick

> On Feb 11, 2020, at 6:15 PM, Fischer, Stephen <[hidden email]> wrote:
>
> Hi wise Solr experts,
>
> For our scientific use-case we want to show users a per-field count of documents that match that field.
>
> We like to do this efficiently because we might return up to a million documents.
>
> For example, if we had documents describing People, and a query of, say, "Stone" we might want to show
>
> Fields matched:
>  Last name:  145
>  Street: 431
>  Favorite rock band:  13
>  Home exterior: 2340
>
> Is there an efficient way to do this?
>
> So far, we're trying to leverage highlighting.   But it seems very slow.
>
> Thanks

Reply | Threaded
Open this post in threaded view
|

RE: [External] Re: per-field count of documents matched?

Fischer, Stephen
Thanks very much!   By the way, we are using eDisMax, and the queries our UI supports don't include fancy Booleans, so your ideas just might work

Thanks again,
Steve

-----Original Message-----
From: Erick Erickson <[hidden email]>
Sent: Tuesday, February 11, 2020 7:16 PM
To: [hidden email]
Subject: [External] Re: per-field count of documents matched?

Hmmm, you could do a facet query (or a series of them). facet.query=LastName:stone&facet.query=Street:stone etc….. That’d automatically only tally for the docs that match.

You could also consider a custom search component. For the exact case you describe, it’s actually fairly simple. The postings list has, for each term, the list of docs that contain it (internal Lucene doc ID). So I might have for field LastName:
stone -> 1,73,100…

for field Street:
stone-> 264,933…

So it’s simply a matter of, for each term, and each doc the overall query matches go down the list of docs and add them up.

However… I’m not sure you’d get what you want in either case. Consider a query (A AND B) OR (C AND D). And let’s say doc1 contains A in LastName, and C,D in Street. Should A be counted in the LastName tally for this doc?

I suppose you could put the full query in the facet.query above. I’m still not sure it’s what you need, since I’m not sure what "per-field count of documents that match” means in your application…

Best,
Erick

> On Feb 11, 2020, at 6:15 PM, Fischer, Stephen <[hidden email]> wrote:
>
> Hi wise Solr experts,
>
> For our scientific use-case we want to show users a per-field count of documents that match that field.
>
> We like to do this efficiently because we might return up to a million documents.
>
> For example, if we had documents describing People, and a query of,
> say, "Stone" we might want to show
>
> Fields matched:
>  Last name:  145
>  Street: 431
>  Favorite rock band:  13
>  Home exterior: 2340
>
> Is there an efficient way to do this?
>
> So far, we're trying to leverage highlighting.   But it seems very slow.
>
> Thanks

Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: per-field count of documents matched?

Susmit Shukla
i used json facet api for a similar requirement. it can ignore filters from main query if needed and roll up the hit counts to any field ..


> On Feb 11, 2020, at 6:19 PM, Fischer, Stephen <[hidden email]> wrote:
>
> Thanks very much!   By the way, we are using eDisMax, and the queries our UI supports don't include fancy Booleans, so your ideas just might work
>
> Thanks again,
> Steve
>
> -----Original Message-----
> From: Erick Erickson <[hidden email]>
> Sent: Tuesday, February 11, 2020 7:16 PM
> To: [hidden email]
> Subject: [External] Re: per-field count of documents matched?
>
> Hmmm, you could do a facet query (or a series of them). facet.query=LastName:stone&facet.query=Street:stone etc….. That’d automatically only tally for the docs that match.
>
> You could also consider a custom search component. For the exact case you describe, it’s actually fairly simple. The postings list has, for each term, the list of docs that contain it (internal Lucene doc ID). So I might have for field LastName:
> stone -> 1,73,100…
>
> for field Street:
> stone-> 264,933…
>
> So it’s simply a matter of, for each term, and each doc the overall query matches go down the list of docs and add them up.
>
> However… I’m not sure you’d get what you want in either case. Consider a query (A AND B) OR (C AND D). And let’s say doc1 contains A in LastName, and C,D in Street. Should A be counted in the LastName tally for this doc?
>
> I suppose you could put the full query in the facet.query above. I’m still not sure it’s what you need, since I’m not sure what "per-field count of documents that match” means in your application…
>
> Best,
> Erick
>
>> On Feb 11, 2020, at 6:15 PM, Fischer, Stephen <[hidden email]> wrote:
>>
>> Hi wise Solr experts,
>>
>> For our scientific use-case we want to show users a per-field count of documents that match that field.
>>
>> We like to do this efficiently because we might return up to a million documents.
>>
>> For example, if we had documents describing People, and a query of,
>> say, "Stone" we might want to show
>>
>> Fields matched:
>> Last name:  145
>> Street: 431
>> Favorite rock band:  13
>> Home exterior: 2340
>>
>> Is there an efficient way to do this?
>>
>> So far, we're trying to leverage highlighting.   But it seems very slow.
>>
>> Thanks
>