Limiting by range of sum across documents

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Limiting by range of sum across documents

chris-4



I have documents in solr that look like this:
{
� "id": "acme-1",
� "manufacturer": "acme",
� "product_name": "Foo",
� "price": 3.4
}

There are about
150,000 manufacturers, each of which have between 20,000 and 1,000,000 products.��
I'd like to return the sum of all prices that are in the range [100, 200], faceted by manufacturer.� In other words, for each manufacturer, sum the prices of all products for that manufacturer,
and return the sum and the manufacturer name.� For example:
[
� {
� � "manufacturer": "acme",
� � "sum": 150.5
� },
� {
� � "manufacturer": "Johnson,
Inc.",
� � "sum": 167.0
� },
...
]

I tried this:
q=*:*&rows=0&stats=true&stats.field={!tag=piv1 sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer
which "works" on a test
subset of 1,000 manufacturers.� However, there are two problems:
1) This query returns all the manufacturers, so I have to iterate over the entire response object to extract the ones I want.
2) The query on the whole data set takes more than 600 seconds to return, which doesn't fit
our target response time

How can I perform this query?
We're using solr version 5.5.5.
        �


Thanks,
Chris

Reply | Threaded
Open this post in threaded view
|

Re: Limiting by range of sum across documents

Emir Arnautović
Hi Chris,
You mention it returns all manufacturers? Even after you apply filters (don’t see filter in your example)? You can control how many facets are returned with facet.limit and you can use face.pivot.mincount to determine how many facets are returned. If you calculate sum on all manufacturers, it can last.

Maybe you can try json faceting. Something like (url style):

…&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)”}}}}

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Nov 2017, at 19:09, [hidden email] wrote:
>
>
>
>
> I have documents in solr that look like this:
> {
>   "id": "acme-1",
>   "manufacturer": "acme",
>   "product_name": "Foo",
>   "price": 3.4
> }
>  
> There are about
> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 products.  
> I'd like to return the sum of all prices that are in the range [100, 200], faceted by manufacturer.  In other words, for each manufacturer, sum the prices of all products for that manufacturer,
> and return the sum and the manufacturer name.  For example:
> [
>   {
>     "manufacturer": "acme",
>     "sum": 150.5
>   },
>   {
>     "manufacturer": "Johnson,
> Inc.",
>     "sum": 167.0
>   },
> ...
> ]
>  
> I tried this:
> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer
> which "works" on a test
> subset of 1,000 manufacturers.  However, there are two problems:
> 1) This query returns all the manufacturers, so I have to iterate over the entire response object to extract the ones I want.
> 2) The query on the whole data set takes more than 600 seconds to return, which doesn't fit
> our target response time
>  
> How can I perform this query?
> We're using solr version 5.5.5.
>
>
>  
> Thanks,
> Chris
>  

Reply | Threaded
Open this post in threaded view
|

Re: Limiting by range of sum across documents

chris-4




Hi Emir,
I can't apply filters to the original query because I don't know in advance which filters will meet the criterion I'm looking for.� Unless I'm missing something obvious.��

I tried the JSON facet you suggested but received

        "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]

        � },

        � "facet_counts":{

        � � "facet_queries":{},

        � � "facet_fields":{},

        � � "facet_dates":{},

        � � "facet_ranges":{},

        � � "facet_intervals":{},

        � � "facet_heatmaps":{}},

        � "facets":{

        � � "count":0}}

        �




> Hi Chris,

> You mention it returns all manufacturers? Even after you apply filters (don’t see filter in your example)? You can control how many facets are returned with facet.limit and you can use face.pivot.mincount to determine how many facets are returned. If you calculate sum on all
manufacturers, it can last.
>

> Maybe you can try json faceting. Something like (url style):

>

> …&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)”}}}}

>

> HTH,

> Emir

> --

> Monitoring - Log Management - Alerting - Anomaly Detection

> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>

>

>

>> On 12 Nov 2017, at 19:09, [hidden email] wrote:

>>

>>

>>

>>

>> I have documents in solr that look like this:

>> {

>> "id": "acme-1",

>> "manufacturer": "acme",

>> "product_name": "Foo",

>> "price": 3.4

>> }

>>

>> There are about

>> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 products.

>> I'd like to return the sum of all prices that are in the range [100, 200], faceted by manufacturer. In other words, for each manufacturer, sum the prices of all products for that manufacturer,

>> and return the sum and the manufacturer name. For example:

>> [

>> {

>> "manufacturer": "acme",

>> "sum": 150.5

>> },

>> {

>> "manufacturer": "Johnson,

>> Inc.",

>> "sum": 167.0

>> },

>> ...

>> ]

>>

>> I tried this:

>> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer

>> which "works" on a test

>> subset of 1,000 manufacturers. However, there are two problems:

>> 1) This query returns all the manufacturers, so I have to iterate over the entire response object to extract the ones I want.

>> 2) The query on the whole data set takes more than 600 seconds to return, which doesn't fit

>> our target response time

>>

>> How can I perform this query?

>> We're using solr version 5.5.5.

>>

>>

>>

>> Thanks,

>> Chris

>>

>

>
Reply | Threaded
Open this post in threaded view
|

Re: Limiting by range of sum across documents

Emir Arnautović
Hi Chris,

I assumed that you apply some sort of fq=price:[100 TO 200] to focus on wanted products.

Can you share full json faceting request - numFound:0 suggest that something is completely wrong.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 13 Nov 2017, at 21:56, [hidden email] wrote:
>
>
>
>
> �
> Hi Emir,
> I can't apply filters to the original query because I don't know in advance which filters will meet the criterion I'm looking for.� Unless I'm missing something obvious.�
> �
> I tried the JSON facet you suggested but received
>
> "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
>
> � },
>
> � "facet_counts":{
>
> � � "facet_queries":{},
>
> � � "facet_fields":{},
>
> � � "facet_dates":{},
>
> � � "facet_ranges":{},
>
> � � "facet_intervals":{},
>
> � � "facet_heatmaps":{}},
>
> � "facets":{
>
> � � "count":0}}
>
> �
>
> �
>
>
>> Hi Chris,
>
>> You mention it returns all manufacturers? Even after you apply filters (don’t see filter in your example)? You can control how many facets are returned with facet.limit and you can use face.pivot.mincount to determine how many facets are returned. If you calculate sum on all
> manufacturers, it can last.
>>
>
>> Maybe you can try json faceting. Something like (url style):
>
>>
>
>> …&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)”}}}}
>
>>
>
>> HTH,
>
>> Emir
>
>> --
>
>> Monitoring - Log Management - Alerting - Anomaly Detection
>
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>>
>
>>
>
>>
>
>>> On 12 Nov 2017, at 19:09, [hidden email] wrote:
>
>>>
>
>>>
>
>>>
>
>>>
>
>>> I have documents in solr that look like this:
>
>>> {
>
>>> "id": "acme-1",
>
>>> "manufacturer": "acme",
>
>>> "product_name": "Foo",
>
>>> "price": 3.4
>
>>> }
>
>>>
>
>>> There are about
>
>>> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 products.
>
>>> I'd like to return the sum of all prices that are in the range [100, 200], faceted by manufacturer. In other words, for each manufacturer, sum the prices of all products for that manufacturer,
>
>>> and return the sum and the manufacturer name. For example:
>
>>> [
>
>>> {
>
>>> "manufacturer": "acme",
>
>>> "sum": 150.5
>
>>> },
>
>>> {
>
>>> "manufacturer": "Johnson,
>
>>> Inc.",
>
>>> "sum": 167.0
>
>>> },
>
>>> ...
>
>>> ]
>
>>>
>
>>> I tried this:
>
>>> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer
>
>>> which "works" on a test
>
>>> subset of 1,000 manufacturers. However, there are two problems:
>
>>> 1) This query returns all the manufacturers, so I have to iterate over the entire response object to extract the ones I want.
>
>>> 2) The query on the whole data set takes more than 600 seconds to return, which doesn't fit
>
>>> our target response time
>
>>>
>
>>> How can I perform this query?
>
>>> We're using solr version 5.5.5.
>
>>>
>
>>>
>
>>>
>
>>> Thanks,
>
>>> Chris
>
>>>
>
>>
>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Limiting by range of sum across documents

chris-4



I'm not looking for products where the price is in the range [100, 200].
I'm looking for manufacturers for which the sum of the prices of all of their products is in the range [100, 200].



> Hi Chris,

>

> I assumed that you apply some sort of fq=price:[100 TO 200] to focus on wanted products.

>

> Can you share full json faceting request - numFound:0 suggest that something is completely wrong.

>

> Thanks,

> Emir

> --

> Monitoring - Log Management - Alerting - Anomaly Detection

> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>

>

>

>> On 13 Nov 2017, at 21:56, [hidden email] wrote:

>>

>>

>>

>>

>> �

>> Hi Emir,

>> I can't apply filters to the original query because I don't know in advance which filters will meet the criterion I'm looking for.� Unless I'm missing something obvious.�

>> �

>> I tried the JSON facet you suggested but received

>>

>> "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]

>>

>> � },

>>

>> � "facet_counts":{

>>

>> � � "facet_queries":{},

>>

>> � � "facet_fields":{},

>>

>> � � "facet_dates":{},

>>

>> � � "facet_ranges":{},

>>

>> � � "facet_intervals":{},

>>

>> � � "facet_heatmaps":{}},

>>

>> � "facets":{

>>

>> � � "count":0}}

>>

>> �

>>

>> �

>>

>>

>>> Hi Chris,

>>

>>> You mention it returns all manufacturers? Even after you apply filters (don’t see filter in your example)? You can control how many facets are returned with facet.limit and you can use face.pivot.mincount to determine how many facets are returned. If you calculate sum on
all
>> manufacturers, it can last.

>>>

>>

>>> Maybe you can try json faceting. Something like (url style):

>>

>>>

>>

>>> …&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:”sum(price)”}}}}

>>

>>>

>>

>>> HTH,

>>

>>> Emir

>>

>>> --

>>

>>> Monitoring - Log Management - Alerting - Anomaly Detection

>>

>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>>

>>>

>>

>>>

>>

>>>

>>

>>>> On 12 Nov 2017, at 19:09, [hidden email] wrote:

>>

>>>>

>>

>>>>

>>

>>>>

>>

>>>>

>>

>>>> I have documents in solr that look like this:

>>

>>>> {

>>

>>>> "id": "acme-1",

>>

>>>> "manufacturer": "acme",

>>

>>>> "product_name": "Foo",

>>

>>>> "price": 3.4

>>

>>>> }

>>

>>>>

>>

>>>> There are about

>>

>>>> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 products.

>>

>>>> I'd like to return the sum of all prices that are in the range [100, 200], faceted by manufacturer. In other words, for each manufacturer, sum the prices of all products for that manufacturer,

>>

>>>> and return the sum and the manufacturer name. For example:

>>

>>>> [

>>

>>>> {

>>

>>>> "manufacturer": "acme",

>>

>>>> "sum": 150.5

>>

>>>> },

>>

>>>> {

>>

>>>> "manufacturer": "Johnson,

>>

>>>> Inc.",

>>

>>>> "sum": 167.0

>>

>>>> },

>>

>>>> ...

>>

>>>> ]

>>

>>>>

>>

>>>> I tried this:

>>

>>>> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer

>>

>>>> which "works" on a test

>>

>>>> subset of 1,000 manufacturers. However, there are two problems:

>>

>>>> 1) This query returns all the manufacturers, so I have to iterate over the entire response object to extract the ones I want.

>>

>>>> 2) The query on the whole data set takes more than 600 seconds to return, which doesn't fit

>>

>>>> our target response time

>>

>>>>

>>

>>>> How can I perform this query?

>>

>>>> We're using solr version 5.5.5.

>>

>>>>

>>

>>>>

>>

>>>>

>>

>>>> Thanks,

>>

>>>> Chris

>>

>>>>

>>

>>>

>>

>>>

>

>
Reply | Threaded
Open this post in threaded view
|

Re: Limiting by range of sum across documents

Erick Erickson
Maybe pivot facets and stats? See:
https://lucene.apache.org/solr/guide/6_6/faceting.html

On Tue, Nov 14, 2017 at 7:51 AM,  <[hidden email]> wrote:

>
>
>
> I'm not looking for products where the price is in the range [100, 200].
> I'm looking for manufacturers for which the sum of the prices of all of their products is in the range [100, 200].
>
>
>
>> Hi Chris,
>
>>
>
>> I assumed that you apply some sort of fq=price:[100 TO 200] to focus on wanted products.
>
>>
>
>> Can you share full json faceting request - numFound:0 suggest that something is completely wrong.
>
>>
>
>> Thanks,
>
>> Emir
>
>> --
>
>> Monitoring - Log Management - Alerting - Anomaly Detection
>
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>>
>
>>
>
>>
>
>>> On 13 Nov 2017, at 21:56, [hidden email] wrote:
>
>>>
>
>>>
>
>>>
>
>>>
>
>>> �
>
>>> Hi Emir,
>
>>> I can't apply filters to the original query because I don't know in advance which filters will meet the criterion I'm looking for.� Unless I'm missing something obvious.�
>
>>> �
>
>>> I tried the JSON facet you suggested but received
>
>>>
>
>>> "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
>
>>>
>
>>> � },
>
>>>
>
>>> � "facet_counts":{
>
>>>
>
>>> � � "facet_queries":{},
>
>>>
>
>>> � � "facet_fields":{},
>
>>>
>
>>> � � "facet_dates":{},
>
>>>
>
>>> � � "facet_ranges":{},
>
>>>
>
>>> � � "facet_intervals":{},
>
>>>
>
>>> � � "facet_heatmaps":{}},
>
>>>
>
>>> � "facets":{
>
>>>
>
>>> � � "count":0}}
>
>>>
>
>>> �
>
>>>
>
>>> �
>
>>>
>
>>>
>
>>>> Hi Chris,
>
>>>
>
>>>> You mention it returns all manufacturers? Even after you apply filters (don&rsquo;t see filter in your example)? You can control how many facets are returned with facet.limit and you can use face.pivot.mincount to determine how many facets are returned. If you calculate sum on
> all
>>> manufacturers, it can last.
>
>>>>
>
>>>
>
>>>> Maybe you can try json faceting. Something like (url style):
>
>>>
>
>>>>
>
>>>
>
>>>> &hellip;&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:&rdquo;sum(price)&rdquo;}}}}
>
>>>
>
>>>>
>
>>>
>
>>>> HTH,
>
>>>
>
>>>> Emir
>
>>>
>
>>>> --
>
>>>
>
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>
>>>
>
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>>>
>
>>>>
>
>>>
>
>>>>
>
>>>
>
>>>>
>
>>>
>
>>>>> On 12 Nov 2017, at 19:09, [hidden email] wrote:
>
>>>
>
>>>>>
>
>>>
>
>>>>>
>
>>>
>
>>>>>
>
>>>
>
>>>>>
>
>>>
>
>>>>> I have documents in solr that look like this:
>
>>>
>
>>>>> {
>
>>>
>
>>>>> "id": "acme-1",
>
>>>
>
>>>>> "manufacturer": "acme",
>
>>>
>
>>>>> "product_name": "Foo",
>
>>>
>
>>>>> "price": 3.4
>
>>>
>
>>>>> }
>
>>>
>
>>>>>
>
>>>
>
>>>>> There are about
>
>>>
>
>>>>> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 products.
>
>>>
>
>>>>> I'd like to return the sum of all prices that are in the range [100, 200], faceted by manufacturer. In other words, for each manufacturer, sum the prices of all products for that manufacturer,
>
>>>
>
>>>>> and return the sum and the manufacturer name. For example:
>
>>>
>
>>>>> [
>
>>>
>
>>>>> {
>
>>>
>
>>>>> "manufacturer": "acme",
>
>>>
>
>>>>> "sum": 150.5
>
>>>
>
>>>>> },
>
>>>
>
>>>>> {
>
>>>
>
>>>>> "manufacturer": "Johnson,
>
>>>
>
>>>>> Inc.",
>
>>>
>
>>>>> "sum": 167.0
>
>>>
>
>>>>> },
>
>>>
>
>>>>> ...
>
>>>
>
>>>>> ]
>
>>>
>
>>>>>
>
>>>
>
>>>>> I tried this:
>
>>>
>
>>>>> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer
>
>>>
>
>>>>> which "works" on a test
>
>>>
>
>>>>> subset of 1,000 manufacturers. However, there are two problems:
>
>>>
>
>>>>> 1) This query returns all the manufacturers, so I have to iterate over the entire response object to extract the ones I want.
>
>>>
>
>>>>> 2) The query on the whole data set takes more than 600 seconds to return, which doesn't fit
>
>>>
>
>>>>> our target response time
>
>>>
>
>>>>>
>
>>>
>
>>>>> How can I perform this query?
>
>>>
>
>>>>> We're using solr version 5.5.5.
>
>>>
>
>>>>>
>
>>>
>
>>>>>
>
>>>
>
>>>>>
>
>>>
>
>>>>> Thanks,
>
>>>
>
>>>>> Chris
>
>>>
>
>>>>>
>
>>>
>
>>>>
>
>>>
>
>>>>
>
>>
>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Limiting by range of sum across documents

Emir Arnautović
In reply to this post by chris-4
Hi Chris,
I misunderstood your requirement. I am not aware of some facet result filtering feature. What you could do is sort facet results by sum and load page by page but that does not sound like a good solution. Did you try using streaming expressions - I don’t have much experience with this feature so would have to play a bit before giving answer if possible and how to do it, but I guess someone will be able to give some pointers.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 14 Nov 2017, at 16:51, [hidden email] wrote:
>
>
>
>
> I'm not looking for products where the price is in the range [100, 200].
> I'm looking for manufacturers for which the sum of the prices of all of their products is in the range [100, 200].
> �
>
>
>> Hi Chris,
>
>>
>
>> I assumed that you apply some sort of fq=price:[100 TO 200] to focus on wanted products.
>
>>
>
>> Can you share full json faceting request - numFound:0 suggest that something is completely wrong.
>
>>
>
>> Thanks,
>
>> Emir
>
>> --
>
>> Monitoring - Log Management - Alerting - Anomaly Detection
>
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>>
>
>>
>
>>
>
>>> On 13 Nov 2017, at 21:56, [hidden email] wrote:
>
>>>
>
>>>
>
>>>
>
>>>
>
>>> �
>
>>> Hi Emir,
>
>>> I can't apply filters to the original query because I don't know in advance which filters will meet the criterion I'm looking for.� Unless I'm missing something obvious.�
>
>>> �
>
>>> I tried the JSON facet you suggested but received
>
>>>
>
>>> "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
>
>>>
>
>>> � },
>
>>>
>
>>> � "facet_counts":{
>
>>>
>
>>> � � "facet_queries":{},
>
>>>
>
>>> � � "facet_fields":{},
>
>>>
>
>>> � � "facet_dates":{},
>
>>>
>
>>> � � "facet_ranges":{},
>
>>>
>
>>> � � "facet_intervals":{},
>
>>>
>
>>> � � "facet_heatmaps":{}},
>
>>>
>
>>> � "facets":{
>
>>>
>
>>> � � "count":0}}
>
>>>
>
>>> �
>
>>>
>
>>> �
>
>>>
>
>>>
>
>>>> Hi Chris,
>
>>>
>
>>>> You mention it returns all manufacturers? Even after you apply filters (don&rsquo;t see filter in your example)? You can control how many facets are returned with facet.limit and you can use face.pivot.mincount to determine how many facets are returned. If you calculate sum on
> all
>>> manufacturers, it can last.
>
>>>>
>
>>>
>
>>>> Maybe you can try json faceting. Something like (url style):
>
>>>
>
>>>>
>
>>>
>
>>>> &hellip;&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:&rdquo;sum(price)&rdquo;}}}}
>
>>>
>
>>>>
>
>>>
>
>>>> HTH,
>
>>>
>
>>>> Emir
>
>>>
>
>>>> --
>
>>>
>
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>
>>>
>
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>>>
>
>>>>
>
>>>
>
>>>>
>
>>>
>
>>>>
>
>>>
>
>>>>> On 12 Nov 2017, at 19:09, [hidden email] wrote:
>
>>>
>
>>>>>
>
>>>
>
>>>>>
>
>>>
>
>>>>>
>
>>>
>
>>>>>
>
>>>
>
>>>>> I have documents in solr that look like this:
>
>>>
>
>>>>> {
>
>>>
>
>>>>> "id": "acme-1",
>
>>>
>
>>>>> "manufacturer": "acme",
>
>>>
>
>>>>> "product_name": "Foo",
>
>>>
>
>>>>> "price": 3.4
>
>>>
>
>>>>> }
>
>>>
>
>>>>>
>
>>>
>
>>>>> There are about
>
>>>
>
>>>>> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 products.
>
>>>
>
>>>>> I'd like to return the sum of all prices that are in the range [100, 200], faceted by manufacturer. In other words, for each manufacturer, sum the prices of all products for that manufacturer,
>
>>>
>
>>>>> and return the sum and the manufacturer name. For example:
>
>>>
>
>>>>> [
>
>>>
>
>>>>> {
>
>>>
>
>>>>> "manufacturer": "acme",
>
>>>
>
>>>>> "sum": 150.5
>
>>>
>
>>>>> },
>
>>>
>
>>>>> {
>
>>>
>
>>>>> "manufacturer": "Johnson,
>
>>>
>
>>>>> Inc.",
>
>>>
>
>>>>> "sum": 167.0
>
>>>
>
>>>>> },
>
>>>
>
>>>>> ...
>
>>>
>
>>>>> ]
>
>>>
>
>>>>>
>
>>>
>
>>>>> I tried this:
>
>>>
>
>>>>> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer
>
>>>
>
>>>>> which "works" on a test
>
>>>
>
>>>>> subset of 1,000 manufacturers. However, there are two problems:
>
>>>
>
>>>>> 1) This query returns all the manufacturers, so I have to iterate over the entire response object to extract the ones I want.
>
>>>
>
>>>>> 2) The query on the whole data set takes more than 600 seconds to return, which doesn't fit
>
>>>
>
>>>>> our target response time
>
>>>
>
>>>>>
>
>>>
>
>>>>> How can I perform this query?
>
>>>
>
>>>>> We're using solr version 5.5.5.
>
>>>
>
>>>>>
>
>>>
>
>>>>>
>
>>>
>
>>>>>
>
>>>
>
>>>>> Thanks,
>
>>>
>
>>>>> Chris
>
>>>
>
>>>>>
>
>>>
>
>>>>
>
>>>
>
>>>>
>
>>
>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Limiting by range of sum across documents

chris-4



Emir,
It certainly seems like I'll need to use streaming expressions.
Thanks for your help!
Chris


> Hi Chris,

> I misunderstood your requirement. I am not aware of some facet result filtering feature. What you could do is sort facet results by sum and load page by page but that does not sound like a good solution. Did you try using streaming expressions - I don&rsquo;t have much experience with this
feature so would have to play a bit before giving answer if possible and how to do it, but I guess someone will be able to give some pointers.
>

> Thanks,

> Emir

> --

> Monitoring - Log Management - Alerting - Anomaly Detection

> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>

>

>

>> On 14 Nov 2017, at 16:51, [hidden email] wrote:

>>

>>

>>

>>

>> I'm not looking for products where the price is in the range [100, 200].

>> I'm looking for manufacturers for which the sum of the prices of all of their products is in the range [100, 200].

>> �

>>

>>

>>> Hi Chris,

>>

>>>

>>

>>> I assumed that you apply some sort of fq=price:[100 TO 200] to focus on wanted products.

>>

>>>

>>

>>> Can you share full json faceting request - numFound:0 suggest that something is completely wrong.

>>

>>>

>>

>>> Thanks,

>>

>>> Emir

>>

>>> --

>>

>>> Monitoring - Log Management - Alerting - Anomaly Detection

>>

>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>>

>>>

>>

>>>

>>

>>>

>>

>>>> On 13 Nov 2017, at 21:56, [hidden email] wrote:

>>

>>>>

>>

>>>>

>>

>>>>

>>

>>>>

>>

>>>> �

>>

>>>> Hi Emir,

>>

>>>> I can't apply filters to the original query because I don't know in advance which filters will meet the criterion I'm looking for.� Unless I'm missing something obvious.�

>>

>>>> �

>>

>>>> I tried the JSON facet you suggested but received

>>

>>>>

>>

>>>> "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]

>>

>>>>

>>

>>>> � },

>>

>>>>

>>

>>>> � "facet_counts":{

>>

>>>>

>>

>>>> � � "facet_queries":{},

>>

>>>>

>>

>>>> � � "facet_fields":{},

>>

>>>>

>>

>>>> � � "facet_dates":{},

>>

>>>>

>>

>>>> � � "facet_ranges":{},

>>

>>>>

>>

>>>> � � "facet_intervals":{},

>>

>>>>

>>

>>>> � � "facet_heatmaps":{}},

>>

>>>>

>>

>>>> � "facets":{

>>

>>>>

>>

>>>> � � "count":0}}

>>

>>>>

>>

>>>> �

>>

>>>>

>>

>>>> �

>>

>>>>

>>

>>>>

>>

>>>>> Hi Chris,

>>

>>>>

>>

>>>>> You mention it returns all manufacturers? Even after you apply filters (don&rsquo;t see filter in your example)? You can control how many facets are returned with facet.limit and you can use face.pivot.mincount to determine how many facets are returned. If you calculate sum
on
>> all

>>>> manufacturers, it can last.

>>

>>>>>

>>

>>>>

>>

>>>>> Maybe you can try json faceting. Something like (url style):

>>

>>>>

>>

>>>>>

>>

>>>>

>>

>>>>> &hellip;&json.facet={sumByManu:{terms:{field:manufacturer,facet:{sum:&rdquo;sum(price)&rdquo;}}}}

>>

>>>>

>>

>>>>>

>>

>>>>

>>

>>>>> HTH,

>>

>>>>

>>

>>>>> Emir

>>

>>>>

>>

>>>>> --

>>

>>>>

>>

>>>>> Monitoring - Log Management - Alerting - Anomaly Detection

>>

>>>>

>>

>>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

>>

>>>>

>>

>>>>>

>>

>>>>

>>

>>>>>

>>

>>>>

>>

>>>>>

>>

>>>>

>>

>>>>>> On 12 Nov 2017, at 19:09, [hidden email] wrote:

>>

>>>>

>>

>>>>>>

>>

>>>>

>>

>>>>>>

>>

>>>>

>>

>>>>>>

>>

>>>>

>>

>>>>>>

>>

>>>>

>>

>>>>>> I have documents in solr that look like this:

>>

>>>>

>>

>>>>>> {

>>

>>>>

>>

>>>>>> "id": "acme-1",

>>

>>>>

>>

>>>>>> "manufacturer": "acme",

>>

>>>>

>>

>>>>>> "product_name": "Foo",

>>

>>>>

>>

>>>>>> "price": 3.4

>>

>>>>

>>

>>>>>> }

>>

>>>>

>>

>>>>>>

>>

>>>>

>>

>>>>>> There are about

>>

>>>>

>>

>>>>>> 150,000 manufacturers, each of which have between 20,000 and 1,000,000 products.

>>

>>>>

>>

>>>>>> I'd like to return the sum of all prices that are in the range [100, 200], faceted by manufacturer. In other words, for each manufacturer, sum the prices of all products for that manufacturer,

>>

>>>>

>>

>>>>>> and return the sum and the manufacturer name. For example:

>>

>>>>

>>

>>>>>> [

>>

>>>>

>>

>>>>>> {

>>

>>>>

>>

>>>>>> "manufacturer": "acme",

>>

>>>>

>>

>>>>>> "sum": 150.5

>>

>>>>

>>

>>>>>> },

>>

>>>>

>>

>>>>>> {

>>

>>>>

>>

>>>>>> "manufacturer": "Johnson,

>>

>>>>

>>

>>>>>> Inc.",

>>

>>>>

>>

>>>>>> "sum": 167.0

>>

>>>>

>>

>>>>>> },

>>

>>>>

>>

>>>>>> ...

>>

>>>>

>>

>>>>>> ]

>>

>>>>

>>

>>>>>>

>>

>>>>

>>

>>>>>> I tried this:

>>

>>>>

>>

>>>>>> q=*:*&rows=0&stats=true&stats.field={!tag=piv1 sum=true}price&facet=true&facet.pivot={!stats=piv1}manufacturer

>>

>>>>

>>

>>>>>> which "works" on a test

>>

>>>>

>>

>>>>>> subset of 1,000 manufacturers. However, there are two problems:

>>

>>>>

>>

>>>>>> 1) This query returns all the manufacturers, so I have to iterate over the entire response object to extract the ones I want.

>>

>>>>

>>

>>>>>> 2) The query on the whole data set takes more than 600 seconds to return, which doesn't fit

>>

>>>>

>>

>>>>>> our target response time

>>

>>>>

>>

>>>>>>

>>

>>>>

>>

>>>>>> How can I perform this query?

>>

>>>>

>>

>>>>>> We're using solr version 5.5.5.

>>

>>>>

>>

>>>>>>

>>

>>>>

>>

>>>>>>

>>

>>>>

>>

>>>>>>

>>

>>>>

>>

>>>>>> Thanks,

>>

>>>>

>>

>>>>>> Chris

>>

>>>>

>>

>>>>>>

>>

>>>>

>>

>>>>>

>>

>>>>

>>

>>>>>

>>

>>>

>>

>>>

>

>