Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Kumaran Ramasubramanian
Hi All,
            We all know that Lucene supports faceting by providing
Taxonomy(Separate index and hierarchical facets) and
SortedSetDocValuesFacetField ( flat facets and no sidecar index).

      Then why did solr and elastic search go for its own implementation ?
 ( that is, solr uses block join & elasticsearch uses aggregations ) Is
there any limitations in lucene's implementation ?


--
Kumaran R
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Shai Erera
Hi

The reason IMO is historic - ES and Solr had faceting solutions before
Lucene had it. There were discussions in the past about using the Lucene
faceting module in Solr (can't tell for ES) but, sadly, I can't say I see
it happening at this point.

Regarding your other question, IMO the Lucene faceting engine, in terms of
performance and customizability, is on par with Solr/ES. However, it lacks
distributed faceting support and aggregations. Since many people use
Solr/ES and not Lucene directly, the Solr/ES faceting module continues to
advance separately from the Lucene one.

Enhancing Lucene facets with aggregations and even distributed faceting
capabilities is mostly a matter of time and priorities. If you're
interested in it, I'd be willing to collaborate with you on that as much as
I can!

And I'd still hope that this work finds its way into Solr/ES, as I think
it's silly to have that many number of faceting implementations, where they
all rely on the same low-level data structure - Lucene!

Shai


On Thu, Nov 10, 2016 at 12:32 PM Kumaran Ramasubramanian <[hidden email]>
wrote:

> Hi All,
>             We all know that Lucene supports faceting by providing
> Taxonomy(Separate index and hierarchical facets) and
> SortedSetDocValuesFacetField ( flat facets and no sidecar index).
>
>       Then why did solr and elastic search go for its own implementation ?
>  ( that is, solr uses block join & elasticsearch uses aggregations ) Is
> there any limitations in lucene's implementation ?
>
>
> --
> Kumaran R
>
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Chitra R
Hi Shai,

        i)Hope, when opening SortedSetDocValuesReaderState , we are
calculating ordinals( this will be used to calculate facet count ) for doc
values field and this only made the state instance somewhat costly.
                      Am I right or any other reason behind that?



         ii) During indexing, we are providing facet ordinals in each doc
and I think it will be useful in search side, to calculate facet counts
only for matching docs.  otherwise, it carries any other benefits?


         iii) Is SortedSetDocValuesReaderState thread-safe (ie) multiple
threads can call this method concurrently?


Kindly post your suggestions.


Thanks,

Chitra


On Thu, Nov 10, 2016 at 4:34 PM, Shai Erera <[hidden email]> wrote:

> Hi
>
> The reason IMO is historic - ES and Solr had faceting solutions before
> Lucene had it. There were discussions in the past about using the Lucene
> faceting module in Solr (can't tell for ES) but, sadly, I can't say I see
> it happening at this point.
>
> Regarding your other question, IMO the Lucene faceting engine, in terms of
> performance and customizability, is on par with Solr/ES. However, it lacks
> distributed faceting support and aggregations. Since many people use
> Solr/ES and not Lucene directly, the Solr/ES faceting module continues to
> advance separately from the Lucene one.
>
> Enhancing Lucene facets with aggregations and even distributed faceting
> capabilities is mostly a matter of time and priorities. If you're
> interested in it, I'd be willing to collaborate with you on that as much as
> I can!
>
> And I'd still hope that this work finds its way into Solr/ES, as I think
> it's silly to have that many number of faceting implementations, where they
> all rely on the same low-level data structure - Lucene!
>
> Shai
>
>
> On Thu, Nov 10, 2016 at 12:32 PM Kumaran Ramasubramanian <
> [hidden email]>
> wrote:
>
> > Hi All,
> >             We all know that Lucene supports faceting by providing
> > Taxonomy(Separate index and hierarchical facets) and
> > SortedSetDocValuesFacetField ( flat facets and no sidecar index).
> >
> >       Then why did solr and elastic search go for its own implementation
> ?
> >  ( that is, solr uses block join & elasticsearch uses aggregations ) Is
> > there any limitations in lucene's implementation ?
> >
> >
> > --
> > Kumaran R
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Michael McCandless-2
On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]> wrote:

>         i)Hope, when opening SortedSetDocValuesReaderState , we are
> calculating ordinals( this will be used to calculate facet count ) for doc
> values field and this only made the state instance somewhat costly.
>                       Am I right or any other reason behind that?

That's correct.  It adds some latency to an NRT refresh, and some heap
used to hold the ordinal mappings.

>          ii) During indexing, we are providing facet ordinals in each doc
> and I think it will be useful in search side, to calculate facet counts
> only for matching docs.  otherwise, it carries any other benefits?

Well, compared to the taxonomy facets, SSDV facets don't require a
separate index.

But they add latency/heap usage, and they cannot do hierarchical
facets yet (though this could be fixed if someone just built it).

>          iii) Is SortedSetDocValuesReaderState thread-safe (ie) multiple
> threads can call this method concurrently?

Yes.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Chitra R
     Hey, thank you so much for the fast response, I agree NRT refresh is
somewhat costly operations and this is the major pitfall, suppose we use
doc value faceting.


                 While indexing SortedSetDocValuesFacetField , it stores
path and dimension of the given field internally. So Can we achieve
hierarchical facets using DrillDownQuery? Hope, purpose of storing path and
dimension is to achieve hierarchical facets. If yes (ie we can achieve
hierarchy in SSDVFF) , so what is the need to move over taxonomy?
 Else I missed anything?


                 What is the real purpose to store path and dimension in
SSDVF field?


Kindly post your suggestions.

Regards,
Chitra



On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless <
[hidden email]> wrote:

> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]> wrote:
>
> >         i)Hope, when opening SortedSetDocValuesReaderState , we are
> > calculating ordinals( this will be used to calculate facet count ) for
> doc
> > values field and this only made the state instance somewhat costly.
> >                       Am I right or any other reason behind that?
>
> That's correct.  It adds some latency to an NRT refresh, and some heap
> used to hold the ordinal mappings.
>
> >          ii) During indexing, we are providing facet ordinals in each doc
> > and I think it will be useful in search side, to calculate facet counts
> > only for matching docs.  otherwise, it carries any other benefits?
>
> Well, compared to the taxonomy facets, SSDV facets don't require a
> separate index.
>
> But they add latency/heap usage, and they cannot do hierarchical
> facets yet (though this could be fixed if someone just built it).
>
> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie) multiple
> > threads can call this method concurrently?
>
> Yes.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Michael McCandless-2
No, SSDVFF does not do hierarchical faceting today, but this is just a
limitation of the current implementation, and with some changes
(patches welcome!), it could do so.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 14, 2016 at 1:38 AM, Chitra R <[hidden email]> wrote:

>
>      Hey, thank you so much for the fast response, I agree NRT refresh is
> somewhat costly operations and this is the major pitfall, suppose we use doc
> value faceting.
>
>
>                  While indexing SortedSetDocValuesFacetField , it stores
> path and dimension of the given field internally. So Can we achieve
> hierarchical facets using DrillDownQuery? Hope, purpose of storing path and
> dimension is to achieve hierarchical facets. If yes (ie we can achieve
> hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>  Else I missed anything?
>
>
>                  What is the real purpose to store path and dimension in
> SSDVF field?
>
>
> Kindly post your suggestions.
>
> Regards,
> Chitra
>
>
>
> On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
> <[hidden email]> wrote:
>>
>> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]> wrote:
>>
>> >         i)Hope, when opening SortedSetDocValuesReaderState , we are
>> > calculating ordinals( this will be used to calculate facet count ) for
>> > doc
>> > values field and this only made the state instance somewhat costly.
>> >                       Am I right or any other reason behind that?
>>
>> That's correct.  It adds some latency to an NRT refresh, and some heap
>> used to hold the ordinal mappings.
>>
>> >          ii) During indexing, we are providing facet ordinals in each
>> > doc
>> > and I think it will be useful in search side, to calculate facet counts
>> > only for matching docs.  otherwise, it carries any other benefits?
>>
>> Well, compared to the taxonomy facets, SSDV facets don't require a
>> separate index.
>>
>> But they add latency/heap usage, and they cannot do hierarchical
>> facets yet (though this could be fixed if someone just built it).
>>
>> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie) multiple
>> > threads can call this method concurrently?
>>
>> Yes.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Chitra R
In reply to this post by Chitra R
Hi,

Lucene-Drill sideways
<http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html>

jira_issue:LUCENE-4748 <https://issues.apache.org/jira/browse/LUCENE-4748>

                                 Is this the reason( ie Drill sideways
makes a very nice faceted search UI because we
don't "lose" the facet counts after drilling in) behind storing path and
dimension for the given SSDVF field? Else anything?

Regards,
Chitra

     Hey, thank you so much for the fast response, I agree NRT refresh is
somewhat costly operations and this is the major pitfall, suppose we use
doc value faceting.


                 While indexing SortedSetDocValuesFacetField , it stores
path and dimension of the given field internally. So Can we achieve
hierarchical facets using DrillDownQuery? Hope, purpose of storing path and
dimension is to achieve hierarchical facets. If yes (ie we can achieve
hierarchy in SSDVFF) , so what is the need to move over taxonomy?
 Else I missed anything?


                 What is the real purpose to store path and dimension in
SSDVF field?


Kindly post your suggestions.

Regards,
Chitra



On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless <
[hidden email]> wrote:

> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]> wrote:
>
> >         i)Hope, when opening SortedSetDocValuesReaderState , we are
> > calculating ordinals( this will be used to calculate facet count ) for
> doc
> > values field and this only made the state instance somewhat costly.
> >                       Am I right or any other reason behind that?
>
> That's correct.  It adds some latency to an NRT refresh, and some heap
> used to hold the ordinal mappings.
>
> >          ii) During indexing, we are providing facet ordinals in each doc
> > and I think it will be useful in search side, to calculate facet counts
> > only for matching docs.  otherwise, it carries any other benefits?
>
> Well, compared to the taxonomy facets, SSDV facets don't require a
> separate index.
>
> But they add latency/heap usage, and they cannot do hierarchical
> facets yet (though this could be fixed if someone just built it).
>
> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie) multiple
> > threads can call this method concurrently?
>
> Yes.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Michael McCandless-2
You store dimension + string (a single value path, since it's not
hierarchical) into SSDVFF so that you can compute facet counts, either
ordinary drill down counts or the drill sideways counts.

You can see examples of drill sideways at
http://jirasearch.mikemccandless.com, e.g. drill down on any of those
fields on the left and you don't lose the previous facet counts for
that field.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <[hidden email]> wrote:

> Hi,
>
> Lucene-Drill sideways
>
> jira_issue:LUCENE-4748
>
>                                  Is this the reason( ie Drill sideways makes
> a very nice faceted search UI because we
> don't "lose" the facet counts after drilling in) behind storing path and
> dimension for the given SSDVF field? Else anything?
>
> Regards,
> Chitra
>
>
>      Hey, thank you so much for the fast response, I agree NRT refresh is
> somewhat costly operations and this is the major pitfall, suppose we use doc
> value faceting.
>
>
>                  While indexing SortedSetDocValuesFacetField , it stores
> path and dimension of the given field internally. So Can we achieve
> hierarchical facets using DrillDownQuery? Hope, purpose of storing path and
> dimension is to achieve hierarchical facets. If yes (ie we can achieve
> hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>  Else I missed anything?
>
>
>                  What is the real purpose to store path and dimension in
> SSDVF field?
>
>
> Kindly post your suggestions.
>
> Regards,
> Chitra
>
>
>
> On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
> <[hidden email]> wrote:
>>
>> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]> wrote:
>>
>> >         i)Hope, when opening SortedSetDocValuesReaderState , we are
>> > calculating ordinals( this will be used to calculate facet count ) for
>> > doc
>> > values field and this only made the state instance somewhat costly.
>> >                       Am I right or any other reason behind that?
>>
>> That's correct.  It adds some latency to an NRT refresh, and some heap
>> used to hold the ordinal mappings.
>>
>> >          ii) During indexing, we are providing facet ordinals in each
>> > doc
>> > and I think it will be useful in search side, to calculate facet counts
>> > only for matching docs.  otherwise, it carries any other benefits?
>>
>> Well, compared to the taxonomy facets, SSDV facets don't require a
>> separate index.
>>
>> But they add latency/heap usage, and they cannot do hierarchical
>> facets yet (though this could be fixed if someone just built it).
>>
>> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie) multiple
>> > threads can call this method concurrently?
>>
>> Yes.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Chitra R
Okay. I agree with you, Taxonomy maintains and supports hierarchical facets
during indexing. Hope hierarchical in the sense, we might index the field
Publish date : 2010/10/15 as Publish date: 2010 , Publish date: 2010/10
and Publish date: 2010/10/15 , their facet ordinals are maintained in
sidecar index and it is mapped to the main index.

For example:

                In search-lucene.com , I enter a term (say facet), top
documents and their categories are displayed after performing the search.
Say I drill down through Publish date/2010 to collect its child counts and
after I will pass through publishdate/2010/10 to collect their child
counts. And for each drill down, each search will be performed to collect
its top docs and categories.


               *Even I can achieve this in flat facets by changing the
drill down query. *

Am I right or missed anything? yet I don't know if I missed anything...

So What is the need of hierarchical facets? Could you please explain
it(hierarchical facets) in the real-world use case?


Regards,
Chitra

On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless <
[hidden email]> wrote:

> You store dimension + string (a single value path, since it's not
> hierarchical) into SSDVFF so that you can compute facet counts, either
> ordinary drill down counts or the drill sideways counts.
>
> You can see examples of drill sideways at
> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
> fields on the left and you don't lose the previous facet counts for
> that field.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <[hidden email]> wrote:
> > Hi,
> >
> > Lucene-Drill sideways
> >
> > jira_issue:LUCENE-4748
> >
> >                                  Is this the reason( ie Drill sideways
> makes
> > a very nice faceted search UI because we
> > don't "lose" the facet counts after drilling in) behind storing path and
> > dimension for the given SSDVF field? Else anything?
> >
> > Regards,
> > Chitra
> >
> >
> >      Hey, thank you so much for the fast response, I agree NRT refresh is
> > somewhat costly operations and this is the major pitfall, suppose we use
> doc
> > value faceting.
> >
> >
> >                  While indexing SortedSetDocValuesFacetField , it stores
> > path and dimension of the given field internally. So Can we achieve
> > hierarchical facets using DrillDownQuery? Hope, purpose of storing path
> and
> > dimension is to achieve hierarchical facets. If yes (ie we can achieve
> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
> >  Else I missed anything?
> >
> >
> >                  What is the real purpose to store path and dimension in
> > SSDVF field?
> >
> >
> > Kindly post your suggestions.
> >
> > Regards,
> > Chitra
> >
> >
> >
> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
> > <[hidden email]> wrote:
> >>
> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]>
> wrote:
> >>
> >> >         i)Hope, when opening SortedSetDocValuesReaderState , we are
> >> > calculating ordinals( this will be used to calculate facet count ) for
> >> > doc
> >> > values field and this only made the state instance somewhat costly.
> >> >                       Am I right or any other reason behind that?
> >>
> >> That's correct.  It adds some latency to an NRT refresh, and some heap
> >> used to hold the ordinal mappings.
> >>
> >> >          ii) During indexing, we are providing facet ordinals in each
> >> > doc
> >> > and I think it will be useful in search side, to calculate facet
> counts
> >> > only for matching docs.  otherwise, it carries any other benefits?
> >>
> >> Well, compared to the taxonomy facets, SSDV facets don't require a
> >> separate index.
> >>
> >> But they add latency/heap usage, and they cannot do hierarchical
> >> facets yet (though this could be fixed if someone just built it).
> >>
> >> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie)
> multiple
> >> > threads can call this method concurrently?
> >>
> >> Yes.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Chitra R
case 1:
        In taxonomy, for each indexed document, examines facet label ,
computes their ordinals and mappings, and which will be stored in sidecar
index at index time.

case 2:
        In doc values, these(ordinals) are computed at search time, so
there will be a time and memory trade-off between both cases, hope so.


In taxonomy, building hierarchical facets at index time makes faceting cost
minimal at search time than flat facets in doc values.

Except (memory,time and NRT latency) , Is any another contrast between
hierarchical and flat facets at search time?


Kindly post your suggestions...


Regards,
Chitra

On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <[hidden email]> wrote:

> Okay. I agree with you, Taxonomy maintains and supports hierarchical
> facets during indexing. Hope hierarchical in the sense, we might index the field
> Publish date : 2010/10/15 as Publish date: 2010 , Publish date: 2010/10
> and Publish date: 2010/10/15 , their facet ordinals are maintained in
> sidecar index and it is mapped to the main index.
>
> For example:
>
>                 In search-lucene.com , I enter a term (say facet), top
> documents and their categories are displayed after performing the search.
> Say I drill down through Publish date/2010 to collect its child counts and
> after I will pass through publishdate/2010/10 to collect their child
> counts. And for each drill down, each search will be performed to collect
> its top docs and categories.
>
>
>                *Even I can achieve this in flat facets by changing the
> drill down query. *
>
> Am I right or missed anything? yet I don't know if I missed anything...
>
> So What is the need of hierarchical facets? Could you please explain
> it(hierarchical facets) in the real-world use case?
>
>
> Regards,
> Chitra
>
> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless <
> [hidden email]> wrote:
>
>> You store dimension + string (a single value path, since it's not
>> hierarchical) into SSDVFF so that you can compute facet counts, either
>> ordinary drill down counts or the drill sideways counts.
>>
>> You can see examples of drill sideways at
>> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
>> fields on the left and you don't lose the previous facet counts for
>> that field.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <[hidden email]> wrote:
>> > Hi,
>> >
>> > Lucene-Drill sideways
>> >
>> > jira_issue:LUCENE-4748
>> >
>> >                                  Is this the reason( ie Drill sideways
>> makes
>> > a very nice faceted search UI because we
>> > don't "lose" the facet counts after drilling in) behind storing path and
>> > dimension for the given SSDVF field? Else anything?
>> >
>> > Regards,
>> > Chitra
>> >
>> >
>> >      Hey, thank you so much for the fast response, I agree NRT refresh
>> is
>> > somewhat costly operations and this is the major pitfall, suppose we
>> use doc
>> > value faceting.
>> >
>> >
>> >                  While indexing SortedSetDocValuesFacetField , it stores
>> > path and dimension of the given field internally. So Can we achieve
>> > hierarchical facets using DrillDownQuery? Hope, purpose of storing path
>> and
>> > dimension is to achieve hierarchical facets. If yes (ie we can achieve
>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>> >  Else I missed anything?
>> >
>> >
>> >                  What is the real purpose to store path and dimension in
>> > SSDVF field?
>> >
>> >
>> > Kindly post your suggestions.
>> >
>> > Regards,
>> > Chitra
>> >
>> >
>> >
>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
>> > <[hidden email]> wrote:
>> >>
>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]>
>> wrote:
>> >>
>> >> >         i)Hope, when opening SortedSetDocValuesReaderState , we are
>> >> > calculating ordinals( this will be used to calculate facet count )
>> for
>> >> > doc
>> >> > values field and this only made the state instance somewhat costly.
>> >> >                       Am I right or any other reason behind that?
>> >>
>> >> That's correct.  It adds some latency to an NRT refresh, and some heap
>> >> used to hold the ordinal mappings.
>> >>
>> >> >          ii) During indexing, we are providing facet ordinals in each
>> >> > doc
>> >> > and I think it will be useful in search side, to calculate facet
>> counts
>> >> > only for matching docs.  otherwise, it carries any other benefits?
>> >>
>> >> Well, compared to the taxonomy facets, SSDV facets don't require a
>> >> separate index.
>> >>
>> >> But they add latency/heap usage, and they cannot do hierarchical
>> >> facets yet (though this could be fixed if someone just built it).
>> >>
>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie)
>> multiple
>> >> > threads can call this method concurrently?
>> >>
>> >> Yes.
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Michael McCandless-2
I think you've summed up exactly the differences!

And, yes, it would be possible to emulate hierarchical facets on top
of flat facets, if the hierarchy is fixed depth like year/month/day.

But if it's variable depth, it's trickier (but I think still
possible).  See e.g. the Committed Paths drill-down on the left, on
our dog-food server
http://jirasearch.mikemccandless.com/search.py?index=jira

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <[hidden email]> wrote:

> case 1:
>         In taxonomy, for each indexed document, examines facet label ,
> computes their ordinals and mappings, and which will be stored in sidecar
> index at index time.
>
> case 2:
>         In doc values, these(ordinals) are computed at search time, so there
> will be a time and memory trade-off between both cases, hope so.
>
>
> In taxonomy, building hierarchical facets at index time makes faceting cost
> minimal at search time than flat facets in doc values.
>
> Except (memory,time and NRT latency) , Is any another contrast between
> hierarchical and flat facets at search time?
>
>
> Kindly post your suggestions...
>
>
> Regards,
> Chitra
>
> On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <[hidden email]> wrote:
>>
>> Okay. I agree with you, Taxonomy maintains and supports hierarchical
>> facets during indexing. Hope hierarchical in the sense, we might index the
>> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
>> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are maintained
>> in sidecar index and it is mapped to the main index.
>>
>> For example:
>>
>>                 In search-lucene.com , I enter a term (say facet), top
>> documents and their categories are displayed after performing the search.
>> Say I drill down through Publish date/2010 to collect its child counts and
>> after I will pass through publishdate/2010/10 to collect their child counts.
>> And for each drill down, each search will be performed to collect its top
>> docs and categories.
>>
>>
>>                Even I can achieve this in flat facets by changing the
>> drill down query.
>>
>> Am I right or missed anything? yet I don't know if I missed anything...
>>
>> So What is the need of hierarchical facets? Could you please explain
>> it(hierarchical facets) in the real-world use case?
>>
>>
>> Regards,
>> Chitra
>>
>> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
>> <[hidden email]> wrote:
>>>
>>> You store dimension + string (a single value path, since it's not
>>> hierarchical) into SSDVFF so that you can compute facet counts, either
>>> ordinary drill down counts or the drill sideways counts.
>>>
>>> You can see examples of drill sideways at
>>> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
>>> fields on the left and you don't lose the previous facet counts for
>>> that field.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <[hidden email]> wrote:
>>> > Hi,
>>> >
>>> > Lucene-Drill sideways
>>> >
>>> > jira_issue:LUCENE-4748
>>> >
>>> >                                  Is this the reason( ie Drill sideways
>>> > makes
>>> > a very nice faceted search UI because we
>>> > don't "lose" the facet counts after drilling in) behind storing path
>>> > and
>>> > dimension for the given SSDVF field? Else anything?
>>> >
>>> > Regards,
>>> > Chitra
>>> >
>>> >
>>> >      Hey, thank you so much for the fast response, I agree NRT refresh
>>> > is
>>> > somewhat costly operations and this is the major pitfall, suppose we
>>> > use doc
>>> > value faceting.
>>> >
>>> >
>>> >                  While indexing SortedSetDocValuesFacetField , it
>>> > stores
>>> > path and dimension of the given field internally. So Can we achieve
>>> > hierarchical facets using DrillDownQuery? Hope, purpose of storing path
>>> > and
>>> > dimension is to achieve hierarchical facets. If yes (ie we can achieve
>>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>>> >  Else I missed anything?
>>> >
>>> >
>>> >                  What is the real purpose to store path and dimension
>>> > in
>>> > SSDVF field?
>>> >
>>> >
>>> > Kindly post your suggestions.
>>> >
>>> > Regards,
>>> > Chitra
>>> >
>>> >
>>> >
>>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
>>> > <[hidden email]> wrote:
>>> >>
>>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]>
>>> >> wrote:
>>> >>
>>> >> >         i)Hope, when opening SortedSetDocValuesReaderState , we are
>>> >> > calculating ordinals( this will be used to calculate facet count )
>>> >> > for
>>> >> > doc
>>> >> > values field and this only made the state instance somewhat costly.
>>> >> >                       Am I right or any other reason behind that?
>>> >>
>>> >> That's correct.  It adds some latency to an NRT refresh, and some heap
>>> >> used to hold the ordinal mappings.
>>> >>
>>> >> >          ii) During indexing, we are providing facet ordinals in
>>> >> > each
>>> >> > doc
>>> >> > and I think it will be useful in search side, to calculate facet
>>> >> > counts
>>> >> > only for matching docs.  otherwise, it carries any other benefits?
>>> >>
>>> >> Well, compared to the taxonomy facets, SSDV facets don't require a
>>> >> separate index.
>>> >>
>>> >> But they add latency/heap usage, and they cannot do hierarchical
>>> >> facets yet (though this could be fixed if someone just built it).
>>> >>
>>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie)
>>> >> > multiple
>>> >> > threads can call this method concurrently?
>>> >>
>>> >> Yes.
>>> >>
>>> >> Mike McCandless
>>> >>
>>> >> http://blog.mikemccandless.com
>>> >
>>> >
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Chitra R
Hey, I got it clearly. Thank you so much. Could you please help us to
implement it in our use case?


In our case, we are having dynamic index and it is variable depth too. So
flat facet is enough.No need of hierarchical facets.

What I think is,


   1. Index my facet field as normal doc value field, so that no special
   operation (like taxonomy and sorted set doc values facet field) will be
   done at index time and only doc value field stores its ordinals in their
   respective field.
   2. At search time, I will pass query (user search query) , filter (path
   traversed list)  and collect the matching documents in Facetscollector.


   3. To compute facet count for the specific field, I will gather those
   resulted docs, then move through each segment for collecting the matching
   ordinals using AtomicReader.


And know when I use this means, can't calculate facet count for more than
one field(facet) in a search.

Instead of loading all the dimensions in DocValuesReaderState (will take
more time and memory) at search time, loading specific fields will take
less time and memory, hope so. Kindly help to solve.


It will do it in a minimal index and search cost, I think. And hope this
won't put overload at index time, also at search time this will be better.


Kindly post your suggestions.


Regards,
Chitra




On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless <
[hidden email]> wrote:

> I think you've summed up exactly the differences!
>
> And, yes, it would be possible to emulate hierarchical facets on top
> of flat facets, if the hierarchy is fixed depth like year/month/day.
>
> But if it's variable depth, it's trickier (but I think still
> possible).  See e.g. the Committed Paths drill-down on the left, on
> our dog-food server
> http://jirasearch.mikemccandless.com/search.py?index=jira
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <[hidden email]> wrote:
> > case 1:
> >         In taxonomy, for each indexed document, examines facet label ,
> > computes their ordinals and mappings, and which will be stored in sidecar
> > index at index time.
> >
> > case 2:
> >         In doc values, these(ordinals) are computed at search time, so
> there
> > will be a time and memory trade-off between both cases, hope so.
> >
> >
> > In taxonomy, building hierarchical facets at index time makes faceting
> cost
> > minimal at search time than flat facets in doc values.
> >
> > Except (memory,time and NRT latency) , Is any another contrast between
> > hierarchical and flat facets at search time?
> >
> >
> > Kindly post your suggestions...
> >
> >
> > Regards,
> > Chitra
> >
> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <[hidden email]> wrote:
> >>
> >> Okay. I agree with you, Taxonomy maintains and supports hierarchical
> >> facets during indexing. Hope hierarchical in the sense, we might index
> the
> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
> maintained
> >> in sidecar index and it is mapped to the main index.
> >>
> >> For example:
> >>
> >>                 In search-lucene.com , I enter a term (say facet), top
> >> documents and their categories are displayed after performing the
> search.
> >> Say I drill down through Publish date/2010 to collect its child counts
> and
> >> after I will pass through publishdate/2010/10 to collect their child
> counts.
> >> And for each drill down, each search will be performed to collect its
> top
> >> docs and categories.
> >>
> >>
> >>                Even I can achieve this in flat facets by changing the
> >> drill down query.
> >>
> >> Am I right or missed anything? yet I don't know if I missed anything...
> >>
> >> So What is the need of hierarchical facets? Could you please explain
> >> it(hierarchical facets) in the real-world use case?
> >>
> >>
> >> Regards,
> >> Chitra
> >>
> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
> >> <[hidden email]> wrote:
> >>>
> >>> You store dimension + string (a single value path, since it's not
> >>> hierarchical) into SSDVFF so that you can compute facet counts, either
> >>> ordinary drill down counts or the drill sideways counts.
> >>>
> >>> You can see examples of drill sideways at
> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
> >>> fields on the left and you don't lose the previous facet counts for
> >>> that field.
> >>>
> >>> Mike McCandless
> >>>
> >>> http://blog.mikemccandless.com
> >>>
> >>>
> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <[hidden email]>
> wrote:
> >>> > Hi,
> >>> >
> >>> > Lucene-Drill sideways
> >>> >
> >>> > jira_issue:LUCENE-4748
> >>> >
> >>> >                                  Is this the reason( ie Drill
> sideways
> >>> > makes
> >>> > a very nice faceted search UI because we
> >>> > don't "lose" the facet counts after drilling in) behind storing path
> >>> > and
> >>> > dimension for the given SSDVF field? Else anything?
> >>> >
> >>> > Regards,
> >>> > Chitra
> >>> >
> >>> >
> >>> >      Hey, thank you so much for the fast response, I agree NRT
> refresh
> >>> > is
> >>> > somewhat costly operations and this is the major pitfall, suppose we
> >>> > use doc
> >>> > value faceting.
> >>> >
> >>> >
> >>> >                  While indexing SortedSetDocValuesFacetField , it
> >>> > stores
> >>> > path and dimension of the given field internally. So Can we achieve
> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of storing
> path
> >>> > and
> >>> > dimension is to achieve hierarchical facets. If yes (ie we can
> achieve
> >>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
> >>> >  Else I missed anything?
> >>> >
> >>> >
> >>> >                  What is the real purpose to store path and dimension
> >>> > in
> >>> > SSDVF field?
> >>> >
> >>> >
> >>> > Kindly post your suggestions.
> >>> >
> >>> > Regards,
> >>> > Chitra
> >>> >
> >>> >
> >>> >
> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
> >>> > <[hidden email]> wrote:
> >>> >>
> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]>
> >>> >> wrote:
> >>> >>
> >>> >> >         i)Hope, when opening SortedSetDocValuesReaderState , we
> are
> >>> >> > calculating ordinals( this will be used to calculate facet count )
> >>> >> > for
> >>> >> > doc
> >>> >> > values field and this only made the state instance somewhat
> costly.
> >>> >> >                       Am I right or any other reason behind that?
> >>> >>
> >>> >> That's correct.  It adds some latency to an NRT refresh, and some
> heap
> >>> >> used to hold the ordinal mappings.
> >>> >>
> >>> >> >          ii) During indexing, we are providing facet ordinals in
> >>> >> > each
> >>> >> > doc
> >>> >> > and I think it will be useful in search side, to calculate facet
> >>> >> > counts
> >>> >> > only for matching docs.  otherwise, it carries any other benefits?
> >>> >>
> >>> >> Well, compared to the taxonomy facets, SSDV facets don't require a
> >>> >> separate index.
> >>> >>
> >>> >> But they add latency/heap usage, and they cannot do hierarchical
> >>> >> facets yet (though this could be fixed if someone just built it).
> >>> >>
> >>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie)
> >>> >> > multiple
> >>> >> > threads can call this method concurrently?
> >>> >>
> >>> >> Yes.
> >>> >>
> >>> >> Mike McCandless
> >>> >>
> >>> >> http://blog.mikemccandless.com
> >>> >
> >>> >
> >>
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Chitra R
Kindly post your suggestions.

Regards,
Chitra






























On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <[hidden email]> wrote:

> Hey, I got it clearly. Thank you so much. Could you please help us to
> implement it in our use case?
>
>
> In our case, we are having dynamic index and it is variable depth too. So
> flat facet is enough.No need of hierarchical facets.
>
> What I think is,
>
>
>    1. Index my facet field as normal doc value field, so that no special
>    operation (like taxonomy and sorted set doc values facet field) will be
>    done at index time and only doc value field stores its ordinals in their
>    respective field.
>    2. At search time, I will pass query (user search query) , filter
>    (path traversed list)  and collect the matching documents in
>    Facetscollector.
>
>    3. To compute facet count for the specific field, I will gather those
>    resulted docs, then move through each segment for collecting the matching
>    ordinals using AtomicReader.
>
>
> And know when I use this means, can't calculate facet count for more than
> one field(facet) in a search.
>
> Instead of loading all the dimensions in DocValuesReaderState (will take
> more time and memory) at search time, loading specific fields will take
> less time and memory, hope so. Kindly help to solve.
>
>
> It will do it in a minimal index and search cost, I think. And hope this
> won't put overload at index time, also at search time this will be better.
>
>
> Kindly post your suggestions.
>
>
> Regards,
> Chitra
>
>
>
>
> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless <
> [hidden email]> wrote:
>
>> I think you've summed up exactly the differences!
>>
>> And, yes, it would be possible to emulate hierarchical facets on top
>> of flat facets, if the hierarchy is fixed depth like year/month/day.
>>
>> But if it's variable depth, it's trickier (but I think still
>> possible).  See e.g. the Committed Paths drill-down on the left, on
>> our dog-food server
>> http://jirasearch.mikemccandless.com/search.py?index=jira
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <[hidden email]> wrote:
>> > case 1:
>> >         In taxonomy, for each indexed document, examines facet label ,
>> > computes their ordinals and mappings, and which will be stored in
>> sidecar
>> > index at index time.
>> >
>> > case 2:
>> >         In doc values, these(ordinals) are computed at search time, so
>> there
>> > will be a time and memory trade-off between both cases, hope so.
>> >
>> >
>> > In taxonomy, building hierarchical facets at index time makes faceting
>> cost
>> > minimal at search time than flat facets in doc values.
>> >
>> > Except (memory,time and NRT latency) , Is any another contrast between
>> > hierarchical and flat facets at search time?
>> >
>> >
>> > Kindly post your suggestions...
>> >
>> >
>> > Regards,
>> > Chitra
>> >
>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <[hidden email]>
>> wrote:
>> >>
>> >> Okay. I agree with you, Taxonomy maintains and supports hierarchical
>> >> facets during indexing. Hope hierarchical in the sense, we might index
>> the
>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
>> maintained
>> >> in sidecar index and it is mapped to the main index.
>> >>
>> >> For example:
>> >>
>> >>                 In search-lucene.com , I enter a term (say facet), top
>> >> documents and their categories are displayed after performing the
>> search.
>> >> Say I drill down through Publish date/2010 to collect its child counts
>> and
>> >> after I will pass through publishdate/2010/10 to collect their child
>> counts.
>> >> And for each drill down, each search will be performed to collect its
>> top
>> >> docs and categories.
>> >>
>> >>
>> >>                Even I can achieve this in flat facets by changing the
>> >> drill down query.
>> >>
>> >> Am I right or missed anything? yet I don't know if I missed anything...
>> >>
>> >> So What is the need of hierarchical facets? Could you please explain
>> >> it(hierarchical facets) in the real-world use case?
>> >>
>> >>
>> >> Regards,
>> >> Chitra
>> >>
>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
>> >> <[hidden email]> wrote:
>> >>>
>> >>> You store dimension + string (a single value path, since it's not
>> >>> hierarchical) into SSDVFF so that you can compute facet counts, either
>> >>> ordinary drill down counts or the drill sideways counts.
>> >>>
>> >>> You can see examples of drill sideways at
>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of those
>> >>> fields on the left and you don't lose the previous facet counts for
>> >>> that field.
>> >>>
>> >>> Mike McCandless
>> >>>
>> >>> http://blog.mikemccandless.com
>> >>>
>> >>>
>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <[hidden email]>
>> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > Lucene-Drill sideways
>> >>> >
>> >>> > jira_issue:LUCENE-4748
>> >>> >
>> >>> >                                  Is this the reason( ie Drill
>> sideways
>> >>> > makes
>> >>> > a very nice faceted search UI because we
>> >>> > don't "lose" the facet counts after drilling in) behind storing path
>> >>> > and
>> >>> > dimension for the given SSDVF field? Else anything?
>> >>> >
>> >>> > Regards,
>> >>> > Chitra
>> >>> >
>> >>> >
>> >>> >      Hey, thank you so much for the fast response, I agree NRT
>> refresh
>> >>> > is
>> >>> > somewhat costly operations and this is the major pitfall, suppose we
>> >>> > use doc
>> >>> > value faceting.
>> >>> >
>> >>> >
>> >>> >                  While indexing SortedSetDocValuesFacetField , it
>> >>> > stores
>> >>> > path and dimension of the given field internally. So Can we achieve
>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of storing
>> path
>> >>> > and
>> >>> > dimension is to achieve hierarchical facets. If yes (ie we can
>> achieve
>> >>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>> >>> >  Else I missed anything?
>> >>> >
>> >>> >
>> >>> >                  What is the real purpose to store path and
>> dimension
>> >>> > in
>> >>> > SSDVF field?
>> >>> >
>> >>> >
>> >>> > Kindly post your suggestions.
>> >>> >
>> >>> > Regards,
>> >>> > Chitra
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
>> >>> > <[hidden email]> wrote:
>> >>> >>
>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]>
>> >>> >> wrote:
>> >>> >>
>> >>> >> >         i)Hope, when opening SortedSetDocValuesReaderState , we
>> are
>> >>> >> > calculating ordinals( this will be used to calculate facet count
>> )
>> >>> >> > for
>> >>> >> > doc
>> >>> >> > values field and this only made the state instance somewhat
>> costly.
>> >>> >> >                       Am I right or any other reason behind that?
>> >>> >>
>> >>> >> That's correct.  It adds some latency to an NRT refresh, and some
>> heap
>> >>> >> used to hold the ordinal mappings.
>> >>> >>
>> >>> >> >          ii) During indexing, we are providing facet ordinals in
>> >>> >> > each
>> >>> >> > doc
>> >>> >> > and I think it will be useful in search side, to calculate facet
>> >>> >> > counts
>> >>> >> > only for matching docs.  otherwise, it carries any other
>> benefits?
>> >>> >>
>> >>> >> Well, compared to the taxonomy facets, SSDV facets don't require a
>> >>> >> separate index.
>> >>> >>
>> >>> >> But they add latency/heap usage, and they cannot do hierarchical
>> >>> >> facets yet (though this could be fixed if someone just built it).
>> >>> >>
>> >>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie)
>> >>> >> > multiple
>> >>> >> > threads can call this method concurrently?
>> >>> >>
>> >>> >> Yes.
>> >>> >>
>> >>> >> Mike McCandless
>> >>> >>
>> >>> >> http://blog.mikemccandless.com
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Chitra R
Hi,
         When opening SortedSetDocValuesReaderState at search time, whether
the whole doc value files (.dvd & .dvm) information are loaded in memory or
specified field information(say $facets field) alone load in memory?




Any help is much appreciated.


Regards,
Chitra

On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <[hidden email]> wrote:

>
> Kindly post your suggestions.
>
> Regards,
> Chitra
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <[hidden email]> wrote:
>
>> Hey, I got it clearly. Thank you so much. Could you please help us to
>> implement it in our use case?
>>
>>
>> In our case, we are having dynamic index and it is variable depth too. So
>> flat facet is enough.No need of hierarchical facets.
>>
>> What I think is,
>>
>>
>>    1. Index my facet field as normal doc value field, so that no special
>>    operation (like taxonomy and sorted set doc values facet field) will be
>>    done at index time and only doc value field stores its ordinals in their
>>    respective field.
>>    2. At search time, I will pass query (user search query) , filter
>>    (path traversed list)  and collect the matching documents in
>>    Facetscollector.
>>
>>    3. To compute facet count for the specific field, I will gather those
>>    resulted docs, then move through each segment for collecting the matching
>>    ordinals using AtomicReader.
>>
>>
>> And know when I use this means, can't calculate facet count for more than
>> one field(facet) in a search.
>>
>> Instead of loading all the dimensions in DocValuesReaderState (will take
>> more time and memory) at search time, loading specific fields will take
>> less time and memory, hope so. Kindly help to solve.
>>
>>
>> It will do it in a minimal index and search cost, I think. And hope this
>> won't put overload at index time, also at search time this will be better.
>>
>>
>> Kindly post your suggestions.
>>
>>
>> Regards,
>> Chitra
>>
>>
>>
>>
>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless <
>> [hidden email]> wrote:
>>
>>> I think you've summed up exactly the differences!
>>>
>>> And, yes, it would be possible to emulate hierarchical facets on top
>>> of flat facets, if the hierarchy is fixed depth like year/month/day.
>>>
>>> But if it's variable depth, it's trickier (but I think still
>>> possible).  See e.g. the Committed Paths drill-down on the left, on
>>> our dog-food server
>>> http://jirasearch.mikemccandless.com/search.py?index=jira
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <[hidden email]> wrote:
>>> > case 1:
>>> >         In taxonomy, for each indexed document, examines facet label ,
>>> > computes their ordinals and mappings, and which will be stored in
>>> sidecar
>>> > index at index time.
>>> >
>>> > case 2:
>>> >         In doc values, these(ordinals) are computed at search time, so
>>> there
>>> > will be a time and memory trade-off between both cases, hope so.
>>> >
>>> >
>>> > In taxonomy, building hierarchical facets at index time makes faceting
>>> cost
>>> > minimal at search time than flat facets in doc values.
>>> >
>>> > Except (memory,time and NRT latency) , Is any another contrast between
>>> > hierarchical and flat facets at search time?
>>> >
>>> >
>>> > Kindly post your suggestions...
>>> >
>>> >
>>> > Regards,
>>> > Chitra
>>> >
>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <[hidden email]>
>>> wrote:
>>> >>
>>> >> Okay. I agree with you, Taxonomy maintains and supports hierarchical
>>> >> facets during indexing. Hope hierarchical in the sense, we might
>>> index the
>>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
>>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
>>> maintained
>>> >> in sidecar index and it is mapped to the main index.
>>> >>
>>> >> For example:
>>> >>
>>> >>                 In search-lucene.com , I enter a term (say facet),
>>> top
>>> >> documents and their categories are displayed after performing the
>>> search.
>>> >> Say I drill down through Publish date/2010 to collect its child
>>> counts and
>>> >> after I will pass through publishdate/2010/10 to collect their child
>>> counts.
>>> >> And for each drill down, each search will be performed to collect its
>>> top
>>> >> docs and categories.
>>> >>
>>> >>
>>> >>                Even I can achieve this in flat facets by changing the
>>> >> drill down query.
>>> >>
>>> >> Am I right or missed anything? yet I don't know if I missed
>>> anything...
>>> >>
>>> >> So What is the need of hierarchical facets? Could you please explain
>>> >> it(hierarchical facets) in the real-world use case?
>>> >>
>>> >>
>>> >> Regards,
>>> >> Chitra
>>> >>
>>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
>>> >> <[hidden email]> wrote:
>>> >>>
>>> >>> You store dimension + string (a single value path, since it's not
>>> >>> hierarchical) into SSDVFF so that you can compute facet counts,
>>> either
>>> >>> ordinary drill down counts or the drill sideways counts.
>>> >>>
>>> >>> You can see examples of drill sideways at
>>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of
>>> those
>>> >>> fields on the left and you don't lose the previous facet counts for
>>> >>> that field.
>>> >>>
>>> >>> Mike McCandless
>>> >>>
>>> >>> http://blog.mikemccandless.com
>>> >>>
>>> >>>
>>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <[hidden email]>
>>> wrote:
>>> >>> > Hi,
>>> >>> >
>>> >>> > Lucene-Drill sideways
>>> >>> >
>>> >>> > jira_issue:LUCENE-4748
>>> >>> >
>>> >>> >                                  Is this the reason( ie Drill
>>> sideways
>>> >>> > makes
>>> >>> > a very nice faceted search UI because we
>>> >>> > don't "lose" the facet counts after drilling in) behind storing
>>> path
>>> >>> > and
>>> >>> > dimension for the given SSDVF field? Else anything?
>>> >>> >
>>> >>> > Regards,
>>> >>> > Chitra
>>> >>> >
>>> >>> >
>>> >>> >      Hey, thank you so much for the fast response, I agree NRT
>>> refresh
>>> >>> > is
>>> >>> > somewhat costly operations and this is the major pitfall, suppose
>>> we
>>> >>> > use doc
>>> >>> > value faceting.
>>> >>> >
>>> >>> >
>>> >>> >                  While indexing SortedSetDocValuesFacetField , it
>>> >>> > stores
>>> >>> > path and dimension of the given field internally. So Can we achieve
>>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of storing
>>> path
>>> >>> > and
>>> >>> > dimension is to achieve hierarchical facets. If yes (ie we can
>>> achieve
>>> >>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>>> >>> >  Else I missed anything?
>>> >>> >
>>> >>> >
>>> >>> >                  What is the real purpose to store path and
>>> dimension
>>> >>> > in
>>> >>> > SSDVF field?
>>> >>> >
>>> >>> >
>>> >>> > Kindly post your suggestions.
>>> >>> >
>>> >>> > Regards,
>>> >>> > Chitra
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
>>> >>> > <[hidden email]> wrote:
>>> >>> >>
>>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]>
>>> >>> >> wrote:
>>> >>> >>
>>> >>> >> >         i)Hope, when opening SortedSetDocValuesReaderState , we
>>> are
>>> >>> >> > calculating ordinals( this will be used to calculate facet
>>> count )
>>> >>> >> > for
>>> >>> >> > doc
>>> >>> >> > values field and this only made the state instance somewhat
>>> costly.
>>> >>> >> >                       Am I right or any other reason behind
>>> that?
>>> >>> >>
>>> >>> >> That's correct.  It adds some latency to an NRT refresh, and some
>>> heap
>>> >>> >> used to hold the ordinal mappings.
>>> >>> >>
>>> >>> >> >          ii) During indexing, we are providing facet ordinals in
>>> >>> >> > each
>>> >>> >> > doc
>>> >>> >> > and I think it will be useful in search side, to calculate facet
>>> >>> >> > counts
>>> >>> >> > only for matching docs.  otherwise, it carries any other
>>> benefits?
>>> >>> >>
>>> >>> >> Well, compared to the taxonomy facets, SSDV facets don't require a
>>> >>> >> separate index.
>>> >>> >>
>>> >>> >> But they add latency/heap usage, and they cannot do hierarchical
>>> >>> >> facets yet (though this could be fixed if someone just built it).
>>> >>> >>
>>> >>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie)
>>> >>> >> > multiple
>>> >>> >> > threads can call this method concurrently?
>>> >>> >>
>>> >>> >> Yes.
>>> >>> >>
>>> >>> >> Mike McCandless
>>> >>> >>
>>> >>> >> http://blog.mikemccandless.com
>>> >>> >
>>> >>> >
>>> >>
>>> >>
>>> >
>>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Michael McCandless-2
Doc values fields are never loaded into memory; at most some small
index structures are.

When you use those fields, the bytes (for just the one doc values
field you are using) are pulled from disk, and the OS will cache them
in memory if available.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 28, 2016 at 6:01 AM, Chitra R <[hidden email]> wrote:

> Hi,
>          When opening SortedSetDocValuesReaderState at search time, whether
> the whole doc value files (.dvd & .dvm) information are loaded in memory or
> specified field information(say $facets field) alone load in memory?
>
>
>
>
> Any help is much appreciated.
>
>
> Regards,
> Chitra
>
> On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <[hidden email]> wrote:
>>
>>
>> Kindly post your suggestions.
>>
>> Regards,
>> Chitra
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <[hidden email]> wrote:
>>>
>>> Hey, I got it clearly. Thank you so much. Could you please help us to
>>> implement it in our use case?
>>>
>>>
>>> In our case, we are having dynamic index and it is variable depth too. So
>>> flat facet is enough.No need of hierarchical facets.
>>>
>>> What I think is,
>>>
>>> Index my facet field as normal doc value field, so that no special
>>> operation (like taxonomy and sorted set doc values facet field) will be done
>>> at index time and only doc value field stores its ordinals in their
>>> respective field.
>>> At search time, I will pass query (user search query) , filter (path
>>> traversed list)  and collect the matching documents in Facetscollector.
>>> To compute facet count for the specific field, I will gather those
>>> resulted docs, then move through each segment for collecting the matching
>>> ordinals using AtomicReader.
>>>
>>>
>>> And know when I use this means, can't calculate facet count for more than
>>> one field(facet) in a search.
>>>
>>> Instead of loading all the dimensions in DocValuesReaderState (will take
>>> more time and memory) at search time, loading specific fields will take less
>>> time and memory, hope so. Kindly help to solve.
>>>
>>>
>>> It will do it in a minimal index and search cost, I think. And hope this
>>> won't put overload at index time, also at search time this will be better.
>>>
>>>
>>> Kindly post your suggestions.
>>>
>>>
>>> Regards,
>>> Chitra
>>>
>>>
>>>
>>>
>>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless
>>> <[hidden email]> wrote:
>>>>
>>>> I think you've summed up exactly the differences!
>>>>
>>>> And, yes, it would be possible to emulate hierarchical facets on top
>>>> of flat facets, if the hierarchy is fixed depth like year/month/day.
>>>>
>>>> But if it's variable depth, it's trickier (but I think still
>>>> possible).  See e.g. the Committed Paths drill-down on the left, on
>>>> our dog-food server
>>>> http://jirasearch.mikemccandless.com/search.py?index=jira
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <[hidden email]> wrote:
>>>> > case 1:
>>>> >         In taxonomy, for each indexed document, examines facet label ,
>>>> > computes their ordinals and mappings, and which will be stored in
>>>> > sidecar
>>>> > index at index time.
>>>> >
>>>> > case 2:
>>>> >         In doc values, these(ordinals) are computed at search time, so
>>>> > there
>>>> > will be a time and memory trade-off between both cases, hope so.
>>>> >
>>>> >
>>>> > In taxonomy, building hierarchical facets at index time makes faceting
>>>> > cost
>>>> > minimal at search time than flat facets in doc values.
>>>> >
>>>> > Except (memory,time and NRT latency) , Is any another contrast between
>>>> > hierarchical and flat facets at search time?
>>>> >
>>>> >
>>>> > Kindly post your suggestions...
>>>> >
>>>> >
>>>> > Regards,
>>>> > Chitra
>>>> >
>>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <[hidden email]>
>>>> > wrote:
>>>> >>
>>>> >> Okay. I agree with you, Taxonomy maintains and supports hierarchical
>>>> >> facets during indexing. Hope hierarchical in the sense, we might
>>>> >> index the
>>>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish date:
>>>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
>>>> >> maintained
>>>> >> in sidecar index and it is mapped to the main index.
>>>> >>
>>>> >> For example:
>>>> >>
>>>> >>                 In search-lucene.com , I enter a term (say facet),
>>>> >> top
>>>> >> documents and their categories are displayed after performing the
>>>> >> search.
>>>> >> Say I drill down through Publish date/2010 to collect its child
>>>> >> counts and
>>>> >> after I will pass through publishdate/2010/10 to collect their child
>>>> >> counts.
>>>> >> And for each drill down, each search will be performed to collect its
>>>> >> top
>>>> >> docs and categories.
>>>> >>
>>>> >>
>>>> >>                Even I can achieve this in flat facets by changing the
>>>> >> drill down query.
>>>> >>
>>>> >> Am I right or missed anything? yet I don't know if I missed
>>>> >> anything...
>>>> >>
>>>> >> So What is the need of hierarchical facets? Could you please explain
>>>> >> it(hierarchical facets) in the real-world use case?
>>>> >>
>>>> >>
>>>> >> Regards,
>>>> >> Chitra
>>>> >>
>>>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
>>>> >> <[hidden email]> wrote:
>>>> >>>
>>>> >>> You store dimension + string (a single value path, since it's not
>>>> >>> hierarchical) into SSDVFF so that you can compute facet counts,
>>>> >>> either
>>>> >>> ordinary drill down counts or the drill sideways counts.
>>>> >>>
>>>> >>> You can see examples of drill sideways at
>>>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of
>>>> >>> those
>>>> >>> fields on the left and you don't lose the previous facet counts for
>>>> >>> that field.
>>>> >>>
>>>> >>> Mike McCandless
>>>> >>>
>>>> >>> http://blog.mikemccandless.com
>>>> >>>
>>>> >>>
>>>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <[hidden email]>
>>>> >>> wrote:
>>>> >>> > Hi,
>>>> >>> >
>>>> >>> > Lucene-Drill sideways
>>>> >>> >
>>>> >>> > jira_issue:LUCENE-4748
>>>> >>> >
>>>> >>> >                                  Is this the reason( ie Drill
>>>> >>> > sideways
>>>> >>> > makes
>>>> >>> > a very nice faceted search UI because we
>>>> >>> > don't "lose" the facet counts after drilling in) behind storing
>>>> >>> > path
>>>> >>> > and
>>>> >>> > dimension for the given SSDVF field? Else anything?
>>>> >>> >
>>>> >>> > Regards,
>>>> >>> > Chitra
>>>> >>> >
>>>> >>> >
>>>> >>> >      Hey, thank you so much for the fast response, I agree NRT
>>>> >>> > refresh
>>>> >>> > is
>>>> >>> > somewhat costly operations and this is the major pitfall, suppose
>>>> >>> > we
>>>> >>> > use doc
>>>> >>> > value faceting.
>>>> >>> >
>>>> >>> >
>>>> >>> >                  While indexing SortedSetDocValuesFacetField , it
>>>> >>> > stores
>>>> >>> > path and dimension of the given field internally. So Can we
>>>> >>> > achieve
>>>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of storing
>>>> >>> > path
>>>> >>> > and
>>>> >>> > dimension is to achieve hierarchical facets. If yes (ie we can
>>>> >>> > achieve
>>>> >>> > hierarchy in SSDVFF) , so what is the need to move over taxonomy?
>>>> >>> >  Else I missed anything?
>>>> >>> >
>>>> >>> >
>>>> >>> >                  What is the real purpose to store path and
>>>> >>> > dimension
>>>> >>> > in
>>>> >>> > SSDVF field?
>>>> >>> >
>>>> >>> >
>>>> >>> > Kindly post your suggestions.
>>>> >>> >
>>>> >>> > Regards,
>>>> >>> > Chitra
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
>>>> >>> > <[hidden email]> wrote:
>>>> >>> >>
>>>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <[hidden email]>
>>>> >>> >> wrote:
>>>> >>> >>
>>>> >>> >> >         i)Hope, when opening SortedSetDocValuesReaderState , we
>>>> >>> >> > are
>>>> >>> >> > calculating ordinals( this will be used to calculate facet
>>>> >>> >> > count )
>>>> >>> >> > for
>>>> >>> >> > doc
>>>> >>> >> > values field and this only made the state instance somewhat
>>>> >>> >> > costly.
>>>> >>> >> >                       Am I right or any other reason behind
>>>> >>> >> > that?
>>>> >>> >>
>>>> >>> >> That's correct.  It adds some latency to an NRT refresh, and some
>>>> >>> >> heap
>>>> >>> >> used to hold the ordinal mappings.
>>>> >>> >>
>>>> >>> >> >          ii) During indexing, we are providing facet ordinals
>>>> >>> >> > in
>>>> >>> >> > each
>>>> >>> >> > doc
>>>> >>> >> > and I think it will be useful in search side, to calculate
>>>> >>> >> > facet
>>>> >>> >> > counts
>>>> >>> >> > only for matching docs.  otherwise, it carries any other
>>>> >>> >> > benefits?
>>>> >>> >>
>>>> >>> >> Well, compared to the taxonomy facets, SSDV facets don't require
>>>> >>> >> a
>>>> >>> >> separate index.
>>>> >>> >>
>>>> >>> >> But they add latency/heap usage, and they cannot do hierarchical
>>>> >>> >> facets yet (though this could be fixed if someone just built it).
>>>> >>> >>
>>>> >>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe (ie)
>>>> >>> >> > multiple
>>>> >>> >> > threads can call this method concurrently?
>>>> >>> >>
>>>> >>> >> Yes.
>>>> >>> >>
>>>> >>> >> Mike McCandless
>>>> >>> >>
>>>> >>> >> http://blog.mikemccandless.com
>>>> >>> >
>>>> >>> >
>>>> >>
>>>> >>
>>>> >
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Chitra R
Thank you so much, mike... Hope, gained a lot of stuff on Doc
Values faceting and also clarified all my doubts. Thanks..!!


*Another use case:*

After getting matching documents for the given query, Is there any way to
calculate mix and max values on NumericDocValuesField ( say date field)?


I would like to implement it in numeric range faceting by splitting the
numeric values (getting from resulted documents) into ranges.


Chitra


On Wed, Nov 30, 2016 at 3:51 AM, Michael McCandless <
[hidden email]> wrote:

> Doc values fields are never loaded into memory; at most some small
> index structures are.
>
> When you use those fields, the bytes (for just the one doc values
> field you are using) are pulled from disk, and the OS will cache them
> in memory if available.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Nov 28, 2016 at 6:01 AM, Chitra R <[hidden email]> wrote:
> > Hi,
> >          When opening SortedSetDocValuesReaderState at search time,
> whether
> > the whole doc value files (.dvd & .dvm) information are loaded in memory
> or
> > specified field information(say $facets field) alone load in memory?
> >
> >
> >
> >
> > Any help is much appreciated.
> >
> >
> > Regards,
> > Chitra
> >
> > On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <[hidden email]> wrote:
> >>
> >>
> >> Kindly post your suggestions.
> >>
> >> Regards,
> >> Chitra
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <[hidden email]>
> wrote:
> >>>
> >>> Hey, I got it clearly. Thank you so much. Could you please help us to
> >>> implement it in our use case?
> >>>
> >>>
> >>> In our case, we are having dynamic index and it is variable depth too.
> So
> >>> flat facet is enough.No need of hierarchical facets.
> >>>
> >>> What I think is,
> >>>
> >>> Index my facet field as normal doc value field, so that no special
> >>> operation (like taxonomy and sorted set doc values facet field) will
> be done
> >>> at index time and only doc value field stores its ordinals in their
> >>> respective field.
> >>> At search time, I will pass query (user search query) , filter (path
> >>> traversed list)  and collect the matching documents in Facetscollector.
> >>> To compute facet count for the specific field, I will gather those
> >>> resulted docs, then move through each segment for collecting the
> matching
> >>> ordinals using AtomicReader.
> >>>
> >>>
> >>> And know when I use this means, can't calculate facet count for more
> than
> >>> one field(facet) in a search.
> >>>
> >>> Instead of loading all the dimensions in DocValuesReaderState (will
> take
> >>> more time and memory) at search time, loading specific fields will
> take less
> >>> time and memory, hope so. Kindly help to solve.
> >>>
> >>>
> >>> It will do it in a minimal index and search cost, I think. And hope
> this
> >>> won't put overload at index time, also at search time this will be
> better.
> >>>
> >>>
> >>> Kindly post your suggestions.
> >>>
> >>>
> >>> Regards,
> >>> Chitra
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless
> >>> <[hidden email]> wrote:
> >>>>
> >>>> I think you've summed up exactly the differences!
> >>>>
> >>>> And, yes, it would be possible to emulate hierarchical facets on top
> >>>> of flat facets, if the hierarchy is fixed depth like year/month/day.
> >>>>
> >>>> But if it's variable depth, it's trickier (but I think still
> >>>> possible).  See e.g. the Committed Paths drill-down on the left, on
> >>>> our dog-food server
> >>>> http://jirasearch.mikemccandless.com/search.py?index=jira
> >>>>
> >>>> Mike McCandless
> >>>>
> >>>> http://blog.mikemccandless.com
> >>>>
> >>>>
> >>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <[hidden email]>
> wrote:
> >>>> > case 1:
> >>>> >         In taxonomy, for each indexed document, examines facet
> label ,
> >>>> > computes their ordinals and mappings, and which will be stored in
> >>>> > sidecar
> >>>> > index at index time.
> >>>> >
> >>>> > case 2:
> >>>> >         In doc values, these(ordinals) are computed at search time,
> so
> >>>> > there
> >>>> > will be a time and memory trade-off between both cases, hope so.
> >>>> >
> >>>> >
> >>>> > In taxonomy, building hierarchical facets at index time makes
> faceting
> >>>> > cost
> >>>> > minimal at search time than flat facets in doc values.
> >>>> >
> >>>> > Except (memory,time and NRT latency) , Is any another contrast
> between
> >>>> > hierarchical and flat facets at search time?
> >>>> >
> >>>> >
> >>>> > Kindly post your suggestions...
> >>>> >
> >>>> >
> >>>> > Regards,
> >>>> > Chitra
> >>>> >
> >>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <[hidden email]>
> >>>> > wrote:
> >>>> >>
> >>>> >> Okay. I agree with you, Taxonomy maintains and supports
> hierarchical
> >>>> >> facets during indexing. Hope hierarchical in the sense, we might
> >>>> >> index the
> >>>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish
> date:
> >>>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
> >>>> >> maintained
> >>>> >> in sidecar index and it is mapped to the main index.
> >>>> >>
> >>>> >> For example:
> >>>> >>
> >>>> >>                 In search-lucene.com , I enter a term (say facet),
> >>>> >> top
> >>>> >> documents and their categories are displayed after performing the
> >>>> >> search.
> >>>> >> Say I drill down through Publish date/2010 to collect its child
> >>>> >> counts and
> >>>> >> after I will pass through publishdate/2010/10 to collect their
> child
> >>>> >> counts.
> >>>> >> And for each drill down, each search will be performed to collect
> its
> >>>> >> top
> >>>> >> docs and categories.
> >>>> >>
> >>>> >>
> >>>> >>                Even I can achieve this in flat facets by changing
> the
> >>>> >> drill down query.
> >>>> >>
> >>>> >> Am I right or missed anything? yet I don't know if I missed
> >>>> >> anything...
> >>>> >>
> >>>> >> So What is the need of hierarchical facets? Could you please
> explain
> >>>> >> it(hierarchical facets) in the real-world use case?
> >>>> >>
> >>>> >>
> >>>> >> Regards,
> >>>> >> Chitra
> >>>> >>
> >>>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
> >>>> >> <[hidden email]> wrote:
> >>>> >>>
> >>>> >>> You store dimension + string (a single value path, since it's not
> >>>> >>> hierarchical) into SSDVFF so that you can compute facet counts,
> >>>> >>> either
> >>>> >>> ordinary drill down counts or the drill sideways counts.
> >>>> >>>
> >>>> >>> You can see examples of drill sideways at
> >>>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of
> >>>> >>> those
> >>>> >>> fields on the left and you don't lose the previous facet counts
> for
> >>>> >>> that field.
> >>>> >>>
> >>>> >>> Mike McCandless
> >>>> >>>
> >>>> >>> http://blog.mikemccandless.com
> >>>> >>>
> >>>> >>>
> >>>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <[hidden email]>
> >>>> >>> wrote:
> >>>> >>> > Hi,
> >>>> >>> >
> >>>> >>> > Lucene-Drill sideways
> >>>> >>> >
> >>>> >>> > jira_issue:LUCENE-4748
> >>>> >>> >
> >>>> >>> >                                  Is this the reason( ie Drill
> >>>> >>> > sideways
> >>>> >>> > makes
> >>>> >>> > a very nice faceted search UI because we
> >>>> >>> > don't "lose" the facet counts after drilling in) behind storing
> >>>> >>> > path
> >>>> >>> > and
> >>>> >>> > dimension for the given SSDVF field? Else anything?
> >>>> >>> >
> >>>> >>> > Regards,
> >>>> >>> > Chitra
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >      Hey, thank you so much for the fast response, I agree NRT
> >>>> >>> > refresh
> >>>> >>> > is
> >>>> >>> > somewhat costly operations and this is the major pitfall,
> suppose
> >>>> >>> > we
> >>>> >>> > use doc
> >>>> >>> > value faceting.
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >                  While indexing SortedSetDocValuesFacetField ,
> it
> >>>> >>> > stores
> >>>> >>> > path and dimension of the given field internally. So Can we
> >>>> >>> > achieve
> >>>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of
> storing
> >>>> >>> > path
> >>>> >>> > and
> >>>> >>> > dimension is to achieve hierarchical facets. If yes (ie we can
> >>>> >>> > achieve
> >>>> >>> > hierarchy in SSDVFF) , so what is the need to move over
> taxonomy?
> >>>> >>> >  Else I missed anything?
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >                  What is the real purpose to store path and
> >>>> >>> > dimension
> >>>> >>> > in
> >>>> >>> > SSDVF field?
> >>>> >>> >
> >>>> >>> >
> >>>> >>> > Kindly post your suggestions.
> >>>> >>> >
> >>>> >>> > Regards,
> >>>> >>> > Chitra
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >
> >>>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
> >>>> >>> > <[hidden email]> wrote:
> >>>> >>> >>
> >>>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <
> [hidden email]>
> >>>> >>> >> wrote:
> >>>> >>> >>
> >>>> >>> >> >         i)Hope, when opening SortedSetDocValuesReaderState ,
> we
> >>>> >>> >> > are
> >>>> >>> >> > calculating ordinals( this will be used to calculate facet
> >>>> >>> >> > count )
> >>>> >>> >> > for
> >>>> >>> >> > doc
> >>>> >>> >> > values field and this only made the state instance somewhat
> >>>> >>> >> > costly.
> >>>> >>> >> >                       Am I right or any other reason behind
> >>>> >>> >> > that?
> >>>> >>> >>
> >>>> >>> >> That's correct.  It adds some latency to an NRT refresh, and
> some
> >>>> >>> >> heap
> >>>> >>> >> used to hold the ordinal mappings.
> >>>> >>> >>
> >>>> >>> >> >          ii) During indexing, we are providing facet ordinals
> >>>> >>> >> > in
> >>>> >>> >> > each
> >>>> >>> >> > doc
> >>>> >>> >> > and I think it will be useful in search side, to calculate
> >>>> >>> >> > facet
> >>>> >>> >> > counts
> >>>> >>> >> > only for matching docs.  otherwise, it carries any other
> >>>> >>> >> > benefits?
> >>>> >>> >>
> >>>> >>> >> Well, compared to the taxonomy facets, SSDV facets don't
> require
> >>>> >>> >> a
> >>>> >>> >> separate index.
> >>>> >>> >>
> >>>> >>> >> But they add latency/heap usage, and they cannot do
> hierarchical
> >>>> >>> >> facets yet (though this could be fixed if someone just built
> it).
> >>>> >>> >>
> >>>> >>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe
> (ie)
> >>>> >>> >> > multiple
> >>>> >>> >> > threads can call this method concurrently?
> >>>> >>> >>
> >>>> >>> >> Yes.
> >>>> >>> >>
> >>>> >>> >> Mike McCandless
> >>>> >>> >>
> >>>> >>> >> http://blog.mikemccandless.com
> >>>> >>> >
> >>>> >>> >
> >>>> >>
> >>>> >>
> >>>> >
> >>>
> >>>
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Shai Erera
This feature is not available in Lucene currently, but it shouldn't be hard
to add it. See Mike's comment here:
http://blog.mikemccandless.com/2013/05/dynamic-faceting-with-lucene.html?showComment=1412777154420#c363162440067733144

One more tricky (yet nicer) feature would be to have it all in one go, i.e.
you'd say something like "facet on field price" and you'd get "interesting"
buckets, per the variance in the results.

But before that, we could have a StatsFacets in Lucene which provide some
statistics about a numeric field (min/max/avg etc.).

On Wed, Nov 30, 2016 at 7:50 AM Chitra R <[hidden email]> wrote:

> Thank you so much, mike... Hope, gained a lot of stuff on Doc
> Values faceting and also clarified all my doubts. Thanks..!!
>
>
> *Another use case:*
>
> After getting matching documents for the given query, Is there any way to
> calculate mix and max values on NumericDocValuesField ( say date field)?
>
>
> I would like to implement it in numeric range faceting by splitting the
> numeric values (getting from resulted documents) into ranges.
>
>
> Chitra
>
>
> On Wed, Nov 30, 2016 at 3:51 AM, Michael McCandless <
> [hidden email]> wrote:
>
> > Doc values fields are never loaded into memory; at most some small
> > index structures are.
> >
> > When you use those fields, the bytes (for just the one doc values
> > field you are using) are pulled from disk, and the OS will cache them
> > in memory if available.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Mon, Nov 28, 2016 at 6:01 AM, Chitra R <[hidden email]> wrote:
> > > Hi,
> > >          When opening SortedSetDocValuesReaderState at search time,
> > whether
> > > the whole doc value files (.dvd & .dvm) information are loaded in
> memory
> > or
> > > specified field information(say $facets field) alone load in memory?
> > >
> > >
> > >
> > >
> > > Any help is much appreciated.
> > >
> > >
> > > Regards,
> > > Chitra
> > >
> > > On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <[hidden email]>
> wrote:
> > >>
> > >>
> > >> Kindly post your suggestions.
> > >>
> > >> Regards,
> > >> Chitra
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <[hidden email]>
> > wrote:
> > >>>
> > >>> Hey, I got it clearly. Thank you so much. Could you please help us to
> > >>> implement it in our use case?
> > >>>
> > >>>
> > >>> In our case, we are having dynamic index and it is variable depth
> too.
> > So
> > >>> flat facet is enough.No need of hierarchical facets.
> > >>>
> > >>> What I think is,
> > >>>
> > >>> Index my facet field as normal doc value field, so that no special
> > >>> operation (like taxonomy and sorted set doc values facet field) will
> > be done
> > >>> at index time and only doc value field stores its ordinals in their
> > >>> respective field.
> > >>> At search time, I will pass query (user search query) , filter (path
> > >>> traversed list)  and collect the matching documents in
> Facetscollector.
> > >>> To compute facet count for the specific field, I will gather those
> > >>> resulted docs, then move through each segment for collecting the
> > matching
> > >>> ordinals using AtomicReader.
> > >>>
> > >>>
> > >>> And know when I use this means, can't calculate facet count for more
> > than
> > >>> one field(facet) in a search.
> > >>>
> > >>> Instead of loading all the dimensions in DocValuesReaderState (will
> > take
> > >>> more time and memory) at search time, loading specific fields will
> > take less
> > >>> time and memory, hope so. Kindly help to solve.
> > >>>
> > >>>
> > >>> It will do it in a minimal index and search cost, I think. And hope
> > this
> > >>> won't put overload at index time, also at search time this will be
> > better.
> > >>>
> > >>>
> > >>> Kindly post your suggestions.
> > >>>
> > >>>
> > >>> Regards,
> > >>> Chitra
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless
> > >>> <[hidden email]> wrote:
> > >>>>
> > >>>> I think you've summed up exactly the differences!
> > >>>>
> > >>>> And, yes, it would be possible to emulate hierarchical facets on top
> > >>>> of flat facets, if the hierarchy is fixed depth like year/month/day.
> > >>>>
> > >>>> But if it's variable depth, it's trickier (but I think still
> > >>>> possible).  See e.g. the Committed Paths drill-down on the left, on
> > >>>> our dog-food server
> > >>>> http://jirasearch.mikemccandless.com/search.py?index=jira
> > >>>>
> > >>>> Mike McCandless
> > >>>>
> > >>>> http://blog.mikemccandless.com
> > >>>>
> > >>>>
> > >>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <[hidden email]>
> > wrote:
> > >>>> > case 1:
> > >>>> >         In taxonomy, for each indexed document, examines facet
> > label ,
> > >>>> > computes their ordinals and mappings, and which will be stored in
> > >>>> > sidecar
> > >>>> > index at index time.
> > >>>> >
> > >>>> > case 2:
> > >>>> >         In doc values, these(ordinals) are computed at search
> time,
> > so
> > >>>> > there
> > >>>> > will be a time and memory trade-off between both cases, hope so.
> > >>>> >
> > >>>> >
> > >>>> > In taxonomy, building hierarchical facets at index time makes
> > faceting
> > >>>> > cost
> > >>>> > minimal at search time than flat facets in doc values.
> > >>>> >
> > >>>> > Except (memory,time and NRT latency) , Is any another contrast
> > between
> > >>>> > hierarchical and flat facets at search time?
> > >>>> >
> > >>>> >
> > >>>> > Kindly post your suggestions...
> > >>>> >
> > >>>> >
> > >>>> > Regards,
> > >>>> > Chitra
> > >>>> >
> > >>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <[hidden email]>
> > >>>> > wrote:
> > >>>> >>
> > >>>> >> Okay. I agree with you, Taxonomy maintains and supports
> > hierarchical
> > >>>> >> facets during indexing. Hope hierarchical in the sense, we might
> > >>>> >> index the
> > >>>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish
> > date:
> > >>>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
> > >>>> >> maintained
> > >>>> >> in sidecar index and it is mapped to the main index.
> > >>>> >>
> > >>>> >> For example:
> > >>>> >>
> > >>>> >>                 In search-lucene.com , I enter a term (say
> facet),
> > >>>> >> top
> > >>>> >> documents and their categories are displayed after performing the
> > >>>> >> search.
> > >>>> >> Say I drill down through Publish date/2010 to collect its child
> > >>>> >> counts and
> > >>>> >> after I will pass through publishdate/2010/10 to collect their
> > child
> > >>>> >> counts.
> > >>>> >> And for each drill down, each search will be performed to collect
> > its
> > >>>> >> top
> > >>>> >> docs and categories.
> > >>>> >>
> > >>>> >>
> > >>>> >>                Even I can achieve this in flat facets by changing
> > the
> > >>>> >> drill down query.
> > >>>> >>
> > >>>> >> Am I right or missed anything? yet I don't know if I missed
> > >>>> >> anything...
> > >>>> >>
> > >>>> >> So What is the need of hierarchical facets? Could you please
> > explain
> > >>>> >> it(hierarchical facets) in the real-world use case?
> > >>>> >>
> > >>>> >>
> > >>>> >> Regards,
> > >>>> >> Chitra
> > >>>> >>
> > >>>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
> > >>>> >> <[hidden email]> wrote:
> > >>>> >>>
> > >>>> >>> You store dimension + string (a single value path, since it's
> not
> > >>>> >>> hierarchical) into SSDVFF so that you can compute facet counts,
> > >>>> >>> either
> > >>>> >>> ordinary drill down counts or the drill sideways counts.
> > >>>> >>>
> > >>>> >>> You can see examples of drill sideways at
> > >>>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any of
> > >>>> >>> those
> > >>>> >>> fields on the left and you don't lose the previous facet counts
> > for
> > >>>> >>> that field.
> > >>>> >>>
> > >>>> >>> Mike McCandless
> > >>>> >>>
> > >>>> >>> http://blog.mikemccandless.com
> > >>>> >>>
> > >>>> >>>
> > >>>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <
> [hidden email]>
> > >>>> >>> wrote:
> > >>>> >>> > Hi,
> > >>>> >>> >
> > >>>> >>> > Lucene-Drill sideways
> > >>>> >>> >
> > >>>> >>> > jira_issue:LUCENE-4748
> > >>>> >>> >
> > >>>> >>> >                                  Is this the reason( ie Drill
> > >>>> >>> > sideways
> > >>>> >>> > makes
> > >>>> >>> > a very nice faceted search UI because we
> > >>>> >>> > don't "lose" the facet counts after drilling in) behind
> storing
> > >>>> >>> > path
> > >>>> >>> > and
> > >>>> >>> > dimension for the given SSDVF field? Else anything?
> > >>>> >>> >
> > >>>> >>> > Regards,
> > >>>> >>> > Chitra
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> >      Hey, thank you so much for the fast response, I agree NRT
> > >>>> >>> > refresh
> > >>>> >>> > is
> > >>>> >>> > somewhat costly operations and this is the major pitfall,
> > suppose
> > >>>> >>> > we
> > >>>> >>> > use doc
> > >>>> >>> > value faceting.
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> >                  While indexing SortedSetDocValuesFacetField ,
> > it
> > >>>> >>> > stores
> > >>>> >>> > path and dimension of the given field internally. So Can we
> > >>>> >>> > achieve
> > >>>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of
> > storing
> > >>>> >>> > path
> > >>>> >>> > and
> > >>>> >>> > dimension is to achieve hierarchical facets. If yes (ie we can
> > >>>> >>> > achieve
> > >>>> >>> > hierarchy in SSDVFF) , so what is the need to move over
> > taxonomy?
> > >>>> >>> >  Else I missed anything?
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> >                  What is the real purpose to store path and
> > >>>> >>> > dimension
> > >>>> >>> > in
> > >>>> >>> > SSDVF field?
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> > Kindly post your suggestions.
> > >>>> >>> >
> > >>>> >>> > Regards,
> > >>>> >>> > Chitra
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
> > >>>> >>> > <[hidden email]> wrote:
> > >>>> >>> >>
> > >>>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <
> > [hidden email]>
> > >>>> >>> >> wrote:
> > >>>> >>> >>
> > >>>> >>> >> >         i)Hope, when opening SortedSetDocValuesReaderState
> ,
> > we
> > >>>> >>> >> > are
> > >>>> >>> >> > calculating ordinals( this will be used to calculate facet
> > >>>> >>> >> > count )
> > >>>> >>> >> > for
> > >>>> >>> >> > doc
> > >>>> >>> >> > values field and this only made the state instance somewhat
> > >>>> >>> >> > costly.
> > >>>> >>> >> >                       Am I right or any other reason behind
> > >>>> >>> >> > that?
> > >>>> >>> >>
> > >>>> >>> >> That's correct.  It adds some latency to an NRT refresh, and
> > some
> > >>>> >>> >> heap
> > >>>> >>> >> used to hold the ordinal mappings.
> > >>>> >>> >>
> > >>>> >>> >> >          ii) During indexing, we are providing facet
> ordinals
> > >>>> >>> >> > in
> > >>>> >>> >> > each
> > >>>> >>> >> > doc
> > >>>> >>> >> > and I think it will be useful in search side, to calculate
> > >>>> >>> >> > facet
> > >>>> >>> >> > counts
> > >>>> >>> >> > only for matching docs.  otherwise, it carries any other
> > >>>> >>> >> > benefits?
> > >>>> >>> >>
> > >>>> >>> >> Well, compared to the taxonomy facets, SSDV facets don't
> > require
> > >>>> >>> >> a
> > >>>> >>> >> separate index.
> > >>>> >>> >>
> > >>>> >>> >> But they add latency/heap usage, and they cannot do
> > hierarchical
> > >>>> >>> >> facets yet (though this could be fixed if someone just built
> > it).
> > >>>> >>> >>
> > >>>> >>> >> >          iii) Is SortedSetDocValuesReaderState thread-safe
> > (ie)
> > >>>> >>> >> > multiple
> > >>>> >>> >> > threads can call this method concurrently?
> > >>>> >>> >>
> > >>>> >>> >> Yes.
> > >>>> >>> >>
> > >>>> >>> >> Mike McCandless
> > >>>> >>> >>
> > >>>> >>> >> http://blog.mikemccandless.com
> > >>>> >>> >
> > >>>> >>> >
> > >>>> >>
> > >>>> >>
> > >>>> >
> > >>>
> > >>>
> > >>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Chitra R
Thank you so much, Shai...

Chitra

On Wed, Nov 30, 2016 at 2:17 PM, Shai Erera <[hidden email]> wrote:

> This feature is not available in Lucene currently, but it shouldn't be hard
> to add it. See Mike's comment here:
> http://blog.mikemccandless.com/2013/05/dynamic-faceting-
> with-lucene.html?showComment=1412777154420#c363162440067733144
>
> One more tricky (yet nicer) feature would be to have it all in one go, i.e.
> you'd say something like "facet on field price" and you'd get "interesting"
> buckets, per the variance in the results.
>
> But before that, we could have a StatsFacets in Lucene which provide some
> statistics about a numeric field (min/max/avg etc.).
>
> On Wed, Nov 30, 2016 at 7:50 AM Chitra R <[hidden email]> wrote:
>
> > Thank you so much, mike... Hope, gained a lot of stuff on Doc
> > Values faceting and also clarified all my doubts. Thanks..!!
> >
> >
> > *Another use case:*
> >
> > After getting matching documents for the given query, Is there any way to
> > calculate mix and max values on NumericDocValuesField ( say date field)?
> >
> >
> > I would like to implement it in numeric range faceting by splitting the
> > numeric values (getting from resulted documents) into ranges.
> >
> >
> > Chitra
> >
> >
> > On Wed, Nov 30, 2016 at 3:51 AM, Michael McCandless <
> > [hidden email]> wrote:
> >
> > > Doc values fields are never loaded into memory; at most some small
> > > index structures are.
> > >
> > > When you use those fields, the bytes (for just the one doc values
> > > field you are using) are pulled from disk, and the OS will cache them
> > > in memory if available.
> > >
> > > Mike McCandless
> > >
> > > http://blog.mikemccandless.com
> > >
> > >
> > > On Mon, Nov 28, 2016 at 6:01 AM, Chitra R <[hidden email]>
> wrote:
> > > > Hi,
> > > >          When opening SortedSetDocValuesReaderState at search time,
> > > whether
> > > > the whole doc value files (.dvd & .dvm) information are loaded in
> > memory
> > > or
> > > > specified field information(say $facets field) alone load in memory?
> > > >
> > > >
> > > >
> > > >
> > > > Any help is much appreciated.
> > > >
> > > >
> > > > Regards,
> > > > Chitra
> > > >
> > > > On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <[hidden email]>
> > wrote:
> > > >>
> > > >>
> > > >> Kindly post your suggestions.
> > > >>
> > > >> Regards,
> > > >> Chitra
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <[hidden email]>
> > > wrote:
> > > >>>
> > > >>> Hey, I got it clearly. Thank you so much. Could you please help us
> to
> > > >>> implement it in our use case?
> > > >>>
> > > >>>
> > > >>> In our case, we are having dynamic index and it is variable depth
> > too.
> > > So
> > > >>> flat facet is enough.No need of hierarchical facets.
> > > >>>
> > > >>> What I think is,
> > > >>>
> > > >>> Index my facet field as normal doc value field, so that no special
> > > >>> operation (like taxonomy and sorted set doc values facet field)
> will
> > > be done
> > > >>> at index time and only doc value field stores its ordinals in their
> > > >>> respective field.
> > > >>> At search time, I will pass query (user search query) , filter
> (path
> > > >>> traversed list)  and collect the matching documents in
> > Facetscollector.
> > > >>> To compute facet count for the specific field, I will gather those
> > > >>> resulted docs, then move through each segment for collecting the
> > > matching
> > > >>> ordinals using AtomicReader.
> > > >>>
> > > >>>
> > > >>> And know when I use this means, can't calculate facet count for
> more
> > > than
> > > >>> one field(facet) in a search.
> > > >>>
> > > >>> Instead of loading all the dimensions in DocValuesReaderState (will
> > > take
> > > >>> more time and memory) at search time, loading specific fields will
> > > take less
> > > >>> time and memory, hope so. Kindly help to solve.
> > > >>>
> > > >>>
> > > >>> It will do it in a minimal index and search cost, I think. And hope
> > > this
> > > >>> won't put overload at index time, also at search time this will be
> > > better.
> > > >>>
> > > >>>
> > > >>> Kindly post your suggestions.
> > > >>>
> > > >>>
> > > >>> Regards,
> > > >>> Chitra
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless
> > > >>> <[hidden email]> wrote:
> > > >>>>
> > > >>>> I think you've summed up exactly the differences!
> > > >>>>
> > > >>>> And, yes, it would be possible to emulate hierarchical facets on
> top
> > > >>>> of flat facets, if the hierarchy is fixed depth like
> year/month/day.
> > > >>>>
> > > >>>> But if it's variable depth, it's trickier (but I think still
> > > >>>> possible).  See e.g. the Committed Paths drill-down on the left,
> on
> > > >>>> our dog-food server
> > > >>>> http://jirasearch.mikemccandless.com/search.py?index=jira
> > > >>>>
> > > >>>> Mike McCandless
> > > >>>>
> > > >>>> http://blog.mikemccandless.com
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <[hidden email]>
> > > wrote:
> > > >>>> > case 1:
> > > >>>> >         In taxonomy, for each indexed document, examines facet
> > > label ,
> > > >>>> > computes their ordinals and mappings, and which will be stored
> in
> > > >>>> > sidecar
> > > >>>> > index at index time.
> > > >>>> >
> > > >>>> > case 2:
> > > >>>> >         In doc values, these(ordinals) are computed at search
> > time,
> > > so
> > > >>>> > there
> > > >>>> > will be a time and memory trade-off between both cases, hope so.
> > > >>>> >
> > > >>>> >
> > > >>>> > In taxonomy, building hierarchical facets at index time makes
> > > faceting
> > > >>>> > cost
> > > >>>> > minimal at search time than flat facets in doc values.
> > > >>>> >
> > > >>>> > Except (memory,time and NRT latency) , Is any another contrast
> > > between
> > > >>>> > hierarchical and flat facets at search time?
> > > >>>> >
> > > >>>> >
> > > >>>> > Kindly post your suggestions...
> > > >>>> >
> > > >>>> >
> > > >>>> > Regards,
> > > >>>> > Chitra
> > > >>>> >
> > > >>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R <
> [hidden email]>
> > > >>>> > wrote:
> > > >>>> >>
> > > >>>> >> Okay. I agree with you, Taxonomy maintains and supports
> > > hierarchical
> > > >>>> >> facets during indexing. Hope hierarchical in the sense, we
> might
> > > >>>> >> index the
> > > >>>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish
> > > date:
> > > >>>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are
> > > >>>> >> maintained
> > > >>>> >> in sidecar index and it is mapped to the main index.
> > > >>>> >>
> > > >>>> >> For example:
> > > >>>> >>
> > > >>>> >>                 In search-lucene.com , I enter a term (say
> > facet),
> > > >>>> >> top
> > > >>>> >> documents and their categories are displayed after performing
> the
> > > >>>> >> search.
> > > >>>> >> Say I drill down through Publish date/2010 to collect its child
> > > >>>> >> counts and
> > > >>>> >> after I will pass through publishdate/2010/10 to collect their
> > > child
> > > >>>> >> counts.
> > > >>>> >> And for each drill down, each search will be performed to
> collect
> > > its
> > > >>>> >> top
> > > >>>> >> docs and categories.
> > > >>>> >>
> > > >>>> >>
> > > >>>> >>                Even I can achieve this in flat facets by
> changing
> > > the
> > > >>>> >> drill down query.
> > > >>>> >>
> > > >>>> >> Am I right or missed anything? yet I don't know if I missed
> > > >>>> >> anything...
> > > >>>> >>
> > > >>>> >> So What is the need of hierarchical facets? Could you please
> > > explain
> > > >>>> >> it(hierarchical facets) in the real-world use case?
> > > >>>> >>
> > > >>>> >>
> > > >>>> >> Regards,
> > > >>>> >> Chitra
> > > >>>> >>
> > > >>>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless
> > > >>>> >> <[hidden email]> wrote:
> > > >>>> >>>
> > > >>>> >>> You store dimension + string (a single value path, since it's
> > not
> > > >>>> >>> hierarchical) into SSDVFF so that you can compute facet
> counts,
> > > >>>> >>> either
> > > >>>> >>> ordinary drill down counts or the drill sideways counts.
> > > >>>> >>>
> > > >>>> >>> You can see examples of drill sideways at
> > > >>>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any
> of
> > > >>>> >>> those
> > > >>>> >>> fields on the left and you don't lose the previous facet
> counts
> > > for
> > > >>>> >>> that field.
> > > >>>> >>>
> > > >>>> >>> Mike McCandless
> > > >>>> >>>
> > > >>>> >>> http://blog.mikemccandless.com
> > > >>>> >>>
> > > >>>> >>>
> > > >>>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R <
> > [hidden email]>
> > > >>>> >>> wrote:
> > > >>>> >>> > Hi,
> > > >>>> >>> >
> > > >>>> >>> > Lucene-Drill sideways
> > > >>>> >>> >
> > > >>>> >>> > jira_issue:LUCENE-4748
> > > >>>> >>> >
> > > >>>> >>> >                                  Is this the reason( ie
> Drill
> > > >>>> >>> > sideways
> > > >>>> >>> > makes
> > > >>>> >>> > a very nice faceted search UI because we
> > > >>>> >>> > don't "lose" the facet counts after drilling in) behind
> > storing
> > > >>>> >>> > path
> > > >>>> >>> > and
> > > >>>> >>> > dimension for the given SSDVF field? Else anything?
> > > >>>> >>> >
> > > >>>> >>> > Regards,
> > > >>>> >>> > Chitra
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> >      Hey, thank you so much for the fast response, I agree
> NRT
> > > >>>> >>> > refresh
> > > >>>> >>> > is
> > > >>>> >>> > somewhat costly operations and this is the major pitfall,
> > > suppose
> > > >>>> >>> > we
> > > >>>> >>> > use doc
> > > >>>> >>> > value faceting.
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> >                  While indexing
> SortedSetDocValuesFacetField ,
> > > it
> > > >>>> >>> > stores
> > > >>>> >>> > path and dimension of the given field internally. So Can we
> > > >>>> >>> > achieve
> > > >>>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of
> > > storing
> > > >>>> >>> > path
> > > >>>> >>> > and
> > > >>>> >>> > dimension is to achieve hierarchical facets. If yes (ie we
> can
> > > >>>> >>> > achieve
> > > >>>> >>> > hierarchy in SSDVFF) , so what is the need to move over
> > > taxonomy?
> > > >>>> >>> >  Else I missed anything?
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> >                  What is the real purpose to store path and
> > > >>>> >>> > dimension
> > > >>>> >>> > in
> > > >>>> >>> > SSDVF field?
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> > Kindly post your suggestions.
> > > >>>> >>> >
> > > >>>> >>> > Regards,
> > > >>>> >>> > Chitra
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless
> > > >>>> >>> > <[hidden email]> wrote:
> > > >>>> >>> >>
> > > >>>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R <
> > > [hidden email]>
> > > >>>> >>> >> wrote:
> > > >>>> >>> >>
> > > >>>> >>> >> >         i)Hope, when opening
> SortedSetDocValuesReaderState
> > ,
> > > we
> > > >>>> >>> >> > are
> > > >>>> >>> >> > calculating ordinals( this will be used to calculate
> facet
> > > >>>> >>> >> > count )
> > > >>>> >>> >> > for
> > > >>>> >>> >> > doc
> > > >>>> >>> >> > values field and this only made the state instance
> somewhat
> > > >>>> >>> >> > costly.
> > > >>>> >>> >> >                       Am I right or any other reason
> behind
> > > >>>> >>> >> > that?
> > > >>>> >>> >>
> > > >>>> >>> >> That's correct.  It adds some latency to an NRT refresh,
> and
> > > some
> > > >>>> >>> >> heap
> > > >>>> >>> >> used to hold the ordinal mappings.
> > > >>>> >>> >>
> > > >>>> >>> >> >          ii) During indexing, we are providing facet
> > ordinals
> > > >>>> >>> >> > in
> > > >>>> >>> >> > each
> > > >>>> >>> >> > doc
> > > >>>> >>> >> > and I think it will be useful in search side, to
> calculate
> > > >>>> >>> >> > facet
> > > >>>> >>> >> > counts
> > > >>>> >>> >> > only for matching docs.  otherwise, it carries any other
> > > >>>> >>> >> > benefits?
> > > >>>> >>> >>
> > > >>>> >>> >> Well, compared to the taxonomy facets, SSDV facets don't
> > > require
> > > >>>> >>> >> a
> > > >>>> >>> >> separate index.
> > > >>>> >>> >>
> > > >>>> >>> >> But they add latency/heap usage, and they cannot do
> > > hierarchical
> > > >>>> >>> >> facets yet (though this could be fixed if someone just
> built
> > > it).
> > > >>>> >>> >>
> > > >>>> >>> >> >          iii) Is SortedSetDocValuesReaderState
> thread-safe
> > > (ie)
> > > >>>> >>> >> > multiple
> > > >>>> >>> >> > threads can call this method concurrently?
> > > >>>> >>> >>
> > > >>>> >>> >> Yes.
> > > >>>> >>> >>
> > > >>>> >>> >> Mike McCandless
> > > >>>> >>> >>
> > > >>>> >>> >> http://blog.mikemccandless.com
> > > >>>> >>> >
> > > >>>> >>> >
> > > >>>> >>
> > > >>>> >>
> > > >>>> >
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> >
>