How to access DocValues inside a customized collector?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to access DocValues inside a customized collector?

Lisheng Zhang
we need to use binary DocValues (in a customized collector) added during
indexing, i first tested in standard TopScoreDocCollector, it seems that we
need to:

LeafReaderContext => reader() => get binary iterator => advanced to correct
location

Is this the correct way or actually we have a better API (since we already
in that docId it seems to me that the binary DocValues should be readily
available?

Also do we have a way to see directly indexed data (Luke seems obsolete,
Marple does not work with lucene 7.4.0 yet)?

Thanks very much for helps, Lisheng
Reply | Threaded
Open this post in threaded view
|

Re: How to access DocValues inside a customized collector?

Erick Erickson
What Luke are you using? I think this one is being maintained:
https://github.com/DmitryKey/luke

The Terms component directly access the indexed data and can be used
to poke around in the indexed data.

I'll skip the accessing DocValues as I have to go back and look every time.
On Thu, Sep 20, 2018 at 6:23 PM Lisheng Zhang <[hidden email]> wrote:

>
> we need to use binary DocValues (in a customized collector) added during
> indexing, i first tested in standard TopScoreDocCollector, it seems that we
> need to:
>
> LeafReaderContext => reader() => get binary iterator => advanced to correct
> location
>
> Is this the correct way or actually we have a better API (since we already
> in that docId it seems to me that the binary DocValues should be readily
> available?
>
> Also do we have a way to see directly indexed data (Luke seems obsolete,
> Marple does not work with lucene 7.4.0 yet)?
>
> Thanks very much for helps, Lisheng

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to access DocValues inside a customized collector?

Lisheng Zhang
Erick: Thanks very much for quick help, Luke you referred worked well (i
found binary DocValues did get put in well)

However i am still not sure how to efficiently access DocValues in a
collector,

" The Terms component directly access the indexed data and can be used
to poke around in the indexed data. "

Could you elaborate a little or roughly point a source code where DocValues
were accessed inside collector (lucene or solr
source code would be fine)?

Thanks again for helps!







On Thu, Sep 20, 2018 at 7:39 PM Erick Erickson <[hidden email]>
wrote:

> What Luke are you using? I think this one is being maintained:
> https://github.com/DmitryKey/luke
>
> The Terms component directly access the indexed data and can be used
> to poke around in the indexed data.
>
> I'll skip the accessing DocValues as I have to go back and look every time.
> On Thu, Sep 20, 2018 at 6:23 PM Lisheng Zhang <[hidden email]> wrote:
> >
> > we need to use binary DocValues (in a customized collector) added during
> > indexing, i first tested in standard TopScoreDocCollector, it seems that
> we
> > need to:
> >
> > LeafReaderContext => reader() => get binary iterator => advanced to
> correct
> > location
> >
> > Is this the correct way or actually we have a better API (since we
> already
> > in that docId it seems to me that the binary DocValues should be readily
> > available?
> >
> > Also do we have a way to see directly indexed data (Luke seems obsolete,
> > Marple does not work with lucene 7.4.0 yet)?
> >
> > Thanks very much for helps, Lisheng
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to access DocValues inside a customized collector?

Mikhail Khludnev-2
In reply to this post by Lisheng Zhang
Not sure why are you looking for something better, since it's the best API
already.
You can check the sample usage at
.FastTaxonomyFacetCounts.countAll(IndexReader), also notice
FastTaxonomyFacetCounts.count(List<MatchingDocs>) where DV iterator is
dragged by enclosing intersection.
also SolrDocumentFetcher.decodeDVField(int, LeafReader, String) does
exactly this.

On Fri, Sep 21, 2018 at 4:23 AM Lisheng Zhang <[hidden email]> wrote:

> we need to use binary DocValues (in a customized collector) added during
> indexing, i first tested in standard TopScoreDocCollector, it seems that we
> need to:
>
> LeafReaderContext => reader() => get binary iterator => advanced to correct
> location
>
> Is this the correct way or actually we have a better API (since we already
> in that docId it seems to me that the binary DocValues should be readily
> available?
>
> Also do we have a way to see directly indexed data (Luke seems obsolete,
> Marple does not work with lucene 7.4.0 yet)?
>
> Thanks very much for helps, Lisheng
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

RE: How to access DocValues inside a customized collector?

Uwe Schindler
In reply to this post by Lisheng Zhang
Hi,

in general your approach is right, but you have to do it correctly. It depends on the Collector subclass you are using. The simplest is to subclass SimpleCollector: https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/search/SimpleCollector.html

There you have to override 2 methods:

doSetNextReader(LeafReaderContext context): Here you call *once* context.reader().getBinaryDocValues(String field) and save the thing in a private member field "actReaderdocValues" of the collector (non-final).

In collect(docId) you can then call actReaderdocValues.advanceExact(docId) and retrieve the value. As collect is always called "in order", its safe to use advanceExact().

Important is: Don't get a new docvalues instance on each call and advanceExact()! This is only needed for out of order! So in combination with an collector (like above) you get maximum performance, as everything is per leaf reader and in order.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Lisheng Zhang <[hidden email]>
> Sent: Friday, September 21, 2018 3:23 AM
> To: [hidden email]
> Subject: How to access DocValues inside a customized collector?
>
> we need to use binary DocValues (in a customized collector) added during
> indexing, i first tested in standard TopScoreDocCollector, it seems that we
> need to:
>
> LeafReaderContext => reader() => get binary iterator => advanced to correct
> location
>
> Is this the correct way or actually we have a better API (since we already
> in that docId it seems to me that the binary DocValues should be readily
> available?
>
> Also do we have a way to see directly indexed data (Luke seems obsolete,
> Marple does not work with lucene 7.4.0 yet)?
>
> Thanks very much for helps, Lisheng


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to access DocValues inside a customized collector?

Lisheng Zhang
Thanks very much Uwe and Mikhail!

Your points are all very well taken, so far it seems to work well, i will
test more to verify details.

Lisheng

On Fri, Sep 21, 2018 at 3:54 AM Uwe Schindler <[hidden email]> wrote:

> Hi,
>
> in general your approach is right, but you have to do it correctly. It
> depends on the Collector subclass you are using. The simplest is to
> subclass SimpleCollector:
> https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/search/SimpleCollector.html
>
> There you have to override 2 methods:
>
> doSetNextReader(LeafReaderContext context): Here you call *once*
> context.reader().getBinaryDocValues(String field) and save the thing in a
> private member field "actReaderdocValues" of the collector (non-final).
>
> In collect(docId) you can then call actReaderdocValues.advanceExact(docId)
> and retrieve the value. As collect is always called "in order", its safe to
> use advanceExact().
>
> Important is: Don't get a new docvalues instance on each call and
> advanceExact()! This is only needed for out of order! So in combination
> with an collector (like above) you get maximum performance, as everything
> is per leaf reader and in order.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
> > -----Original Message-----
> > From: Lisheng Zhang <[hidden email]>
> > Sent: Friday, September 21, 2018 3:23 AM
> > To: [hidden email]
> > Subject: How to access DocValues inside a customized collector?
> >
> > we need to use binary DocValues (in a customized collector) added during
> > indexing, i first tested in standard TopScoreDocCollector, it seems that
> we
> > need to:
> >
> > LeafReaderContext => reader() => get binary iterator => advanced to
> correct
> > location
> >
> > Is this the correct way or actually we have a better API (since we
> already
> > in that docId it seems to me that the binary DocValues should be readily
> > available?
> >
> > Also do we have a way to see directly indexed data (Luke seems obsolete,
> > Marple does not work with lucene 7.4.0 yet)?
> >
> > Thanks very much for helps, Lisheng
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>