Streaming rollUp vs Streaming facet

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Streaming rollUp vs Streaming facet

RAUNAK AGRAWAL
Hi Guys,

I am trying to do an aggregation (sum) using streaming API. I have around
10 billion documents in my collection and every document has around 10
docValues.

So streaming facet is taking close to 6 secs to respond with aggregation on
10 fields while streaming rollup is returning the response in 2 secs.

So my questions are:

1. What is the fundamental difference between streaming facet and rollUp.
2. When to use facet and when to use rollUp.

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Streaming rollUp vs Streaming facet

Joel Bernstein
They are very different.

The "facet" expression sends a request to the JSON facet API which pushes
the aggregation into the search engine. In most scenarios this is the
preferred method because it only streams aggregated results. I would always
try the "facet" expression first before going to rollup.

The "rollup" expression rolls up aggregations over a sorted stream of
tuples. It almost always involves exporting and sorting entire result sets
with the /export handler. There are only two reasons to use this approach:

1) Very high cardinality faceting. By very high I mean millions of facet
values are being returned in the same query.
2) Rollups following any kind of relational algebra. For example a rollup
on top of a hashJoin.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Oct 16, 2018 at 8:54 AM RAUNAK AGRAWAL <[hidden email]>
wrote:

> Hi Guys,
>
> I am trying to do an aggregation (sum) using streaming API. I have around
> 10 billion documents in my collection and every document has around 10
> docValues.
>
> So streaming facet is taking close to 6 secs to respond with aggregation on
> 10 fields while streaming rollup is returning the response in 2 secs.
>
> So my questions are:
>
> 1. What is the fundamental difference between streaming facet and rollUp.
> 2. When to use facet and when to use rollUp.
>
> Thanks
>
Reply | Threaded
Open this post in threaded view
|

Re: Streaming rollUp vs Streaming facet

RAUNAK AGRAWAL
Thanks a lot Joel. This makes sense but in my use case, I am aggregating 10
fields but it is performing 2x better than the facet streaming.

On Wed, Oct 17, 2018 at 6:56 PM Joel Bernstein <[hidden email]> wrote:

> They are very different.
>
> The "facet" expression sends a request to the JSON facet API which pushes
> the aggregation into the search engine. In most scenarios this is the
> preferred method because it only streams aggregated results. I would always
> try the "facet" expression first before going to rollup.
>
> The "rollup" expression rolls up aggregations over a sorted stream of
> tuples. It almost always involves exporting and sorting entire result sets
> with the /export handler. There are only two reasons to use this approach:
>
> 1) Very high cardinality faceting. By very high I mean millions of facet
> values are being returned in the same query.
> 2) Rollups following any kind of relational algebra. For example a rollup
> on top of a hashJoin.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Oct 16, 2018 at 8:54 AM RAUNAK AGRAWAL <[hidden email]>
> wrote:
>
> > Hi Guys,
> >
> > I am trying to do an aggregation (sum) using streaming API. I have around
> > 10 billion documents in my collection and every document has around 10
> > docValues.
> >
> > So streaming facet is taking close to 6 secs to respond with aggregation
> on
> > 10 fields while streaming rollup is returning the response in 2 secs.
> >
> > So my questions are:
> >
> > 1. What is the fundamental difference between streaming facet and rollUp.
> > 2. When to use facet and when to use rollUp.
> >
> > Thanks
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Streaming rollUp vs Streaming facet

Joel Bernstein
Your use case is somewhat special in that it involves 10 fields. With that
many nested facets the JSON facet API may or may not outperform streaming
rollups. For most other cases JSON facet API will outperform rollups.



Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Oct 17, 2018 at 11:21 PM RAUNAK AGRAWAL <[hidden email]>
wrote:

> Thanks a lot Joel. This makes sense but in my use case, I am aggregating 10
> fields but it is performing 2x better than the facet streaming.
>
> On Wed, Oct 17, 2018 at 6:56 PM Joel Bernstein <[hidden email]> wrote:
>
> > They are very different.
> >
> > The "facet" expression sends a request to the JSON facet API which pushes
> > the aggregation into the search engine. In most scenarios this is the
> > preferred method because it only streams aggregated results. I would
> always
> > try the "facet" expression first before going to rollup.
> >
> > The "rollup" expression rolls up aggregations over a sorted stream of
> > tuples. It almost always involves exporting and sorting entire result
> sets
> > with the /export handler. There are only two reasons to use this
> approach:
> >
> > 1) Very high cardinality faceting. By very high I mean millions of facet
> > values are being returned in the same query.
> > 2) Rollups following any kind of relational algebra. For example a rollup
> > on top of a hashJoin.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Tue, Oct 16, 2018 at 8:54 AM RAUNAK AGRAWAL <[hidden email]
> >
> > wrote:
> >
> > > Hi Guys,
> > >
> > > I am trying to do an aggregation (sum) using streaming API. I have
> around
> > > 10 billion documents in my collection and every document has around 10
> > > docValues.
> > >
> > > So streaming facet is taking close to 6 secs to respond with
> aggregation
> > on
> > > 10 fields while streaming rollup is returning the response in 2 secs.
> > >
> > > So my questions are:
> > >
> > > 1. What is the fundamental difference between streaming facet and
> rollUp.
> > > 2. When to use facet and when to use rollUp.
> > >
> > > Thanks
> > >
> >
>