Read DocValue twice

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Read DocValue twice

Vadim Gindin
Hi all

I use DocValue for scoring function. I.e. I have some column with integers,
that are used in scoring formula. So I have a scorer that calculates
scoring function twice:
- in score()
- in explain()

I got the following error in explain:

Caused by: java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkIndex(Buffer.java:540) ~[?:1.8.0_161]
        at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:253)
~[?:1.8.0_161]
        at
org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:118)
~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
ubuntu - 2017-10-13 16:12:42]
        at
org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:385)
~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
ubuntu - 2017-10-13 16:12:42]
        at
org.apache.lucene.util.packed.DirectReader$DirectPackedReader8.get(DirectReader.java:145)
~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
ubuntu - 2017-10-13 16:12:42]
        at
org.apache.lucene.codecs.lucene70.Lucene70DocValuesProducer$3.longValue(Lucene70DocValuesProducer.java:481)
~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
ubuntu - 2017-10-13 16:12:42]
        at
org.apache.lucene.index.SingletonSortedNumericDocValues.nextValue(SingletonSortedNumericDocValues.java:73)
~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
ubuntu - 2017-10-13 16:12:42]

I've found the following comment in the source code of
SortedNumericDocValues.java:

/**
 * Iterates to the next value in the current document.  Do not call
this more than {@link #docValueCount} times
 * for the document.
 */

public abstract long nextValue() throws IOException;


Questions:
1) Why I can't read the values twice?
2) How can I manage this situation?
3) Can it work for NumericDocValues?

Regards,
Vadim Gindin
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Adrien Grand
If you want to read the values again, you need to call setDocument (Lucene
< 7.0) or advanceExact (Lucene >= 7.0) before calling nextValue().

Le lun. 19 févr. 2018 à 14:41, Vadim Gindin <[hidden email]> a écrit :

> Hi all
>
> I use DocValue for scoring function. I.e. I have some column with integers,
> that are used in scoring formula. So I have a scorer that calculates
> scoring function twice:
> - in score()
> - in explain()
>
> I got the following error in explain:
>
> Caused by: java.lang.IndexOutOfBoundsException
>         at java.nio.Buffer.checkIndex(Buffer.java:540) ~[?:1.8.0_161]
>         at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:253)
> ~[?:1.8.0_161]
>         at
> org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:118)
> ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
> ubuntu - 2017-10-13 16:12:42]
>         at
>
> org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:385)
> ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
> ubuntu - 2017-10-13 16:12:42]
>         at
>
> org.apache.lucene.util.packed.DirectReader$DirectPackedReader8.get(DirectReader.java:145)
> ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
> ubuntu - 2017-10-13 16:12:42]
>         at
>
> org.apache.lucene.codecs.lucene70.Lucene70DocValuesProducer$3.longValue(Lucene70DocValuesProducer.java:481)
> ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
> ubuntu - 2017-10-13 16:12:42]
>         at
>
> org.apache.lucene.index.SingletonSortedNumericDocValues.nextValue(SingletonSortedNumericDocValues.java:73)
> ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
> ubuntu - 2017-10-13 16:12:42]
>
> I've found the following comment in the source code of
> SortedNumericDocValues.java:
>
> /**
>  * Iterates to the next value in the current document.  Do not call
> this more than {@link #docValueCount} times
>  * for the document.
>  */
>
> public abstract long nextValue() throws IOException;
>
>
> Questions:
> 1) Why I can't read the values twice?
> 2) How can I manage this situation?
> 3) Can it work for NumericDocValues?
>
> Regards,
> Vadim Gindin
>
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Vadim Gindin
I use these calls in both cases. In score() and explain() I have the
following code:

SortedNumericDocValues numDocVal = DocValues.getSortedNumeric(reader,
fieldName);
if (numDocVal != null && numDocVal.advanceExact(topList.doc)) {
    long val = numDocVal.nextValue();

    ..
}

I reuse the same DisiPriorityQueue of scorers in score() and explain().

On Mon, Feb 19, 2018 at 6:54 PM, Adrien Grand <[hidden email]> wrote:

> If you want to read the values again, you need to call setDocument (Lucene
> < 7.0) or advanceExact (Lucene >= 7.0) before calling nextValue().
>
> Le lun. 19 févr. 2018 à 14:41, Vadim Gindin <[hidden email]> a
> écrit :
>
> > Hi all
> >
> > I use DocValue for scoring function. I.e. I have some column with
> integers,
> > that are used in scoring formula. So I have a scorer that calculates
> > scoring function twice:
> > - in score()
> > - in explain()
> >
> > I got the following error in explain:
> >
> > Caused by: java.lang.IndexOutOfBoundsException
> >         at java.nio.Buffer.checkIndex(Buffer.java:540) ~[?:1.8.0_161]
> >         at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:253)
> > ~[?:1.8.0_161]
> >         at
> > org.apache.lucene.store.ByteBufferGuard.getByte(
> ByteBufferGuard.java:118)
> > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
> > ubuntu - 2017-10-13 16:12:42]
> >         at
> >
> > org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(
> ByteBufferIndexInput.java:385)
> > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
> > ubuntu - 2017-10-13 16:12:42]
> >         at
> >
> > org.apache.lucene.util.packed.DirectReader$DirectPackedReader8.get(
> DirectReader.java:145)
> > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
> > ubuntu - 2017-10-13 16:12:42]
> >         at
> >
> > org.apache.lucene.codecs.lucene70.Lucene70DocValuesProducer$3.longValue(
> Lucene70DocValuesProducer.java:481)
> > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
> > ubuntu - 2017-10-13 16:12:42]
> >         at
> >
> > org.apache.lucene.index.SingletonSortedNumericDocValues.nextValue(
> SingletonSortedNumericDocValues.java:73)
> > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 -
> > ubuntu - 2017-10-13 16:12:42]
> >
> > I've found the following comment in the source code of
> > SortedNumericDocValues.java:
> >
> > /**
> >  * Iterates to the next value in the current document.  Do not call
> > this more than {@link #docValueCount} times
> >  * for the document.
> >  */
> >
> > public abstract long nextValue() throws IOException;
> >
> >
> > Questions:
> > 1) Why I can't read the values twice?
> > 2) How can I manage this situation?
> > 3) Can it work for NumericDocValues?
> >
> > Regards,
> > Vadim Gindin
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Adrien Grand
Can you add some debug logging to see what the values of topList.doc and
reader.maxDoc() are before before you call advanceExact?

What do you mean by "I reuse the same DisiPriorityQueue of scorers in
score() and explain()". This shouldn't be possible.

Le lun. 19 févr. 2018 à 15:23, Vadim Gindin <[hidden email]> a écrit :

> I use these calls in both cases. In score() and explain() I have the
> following code:
>
> SortedNumericDocValues numDocVal = DocValues.getSortedNumeric(reader,
> fieldName);
> if (numDocVal != null && numDocVal.advanceExact(topList.doc)) {
>     long val = numDocVal.nextValue();
>
>     ..
> }
>
> I reuse the same DisiPriorityQueue of scorers in score() and explain().
>
> On Mon, Feb 19, 2018 at 6:54 PM, Adrien Grand <[hidden email]> wrote:
>
> > If you want to read the values again, you need to call setDocument
> (Lucene
> > < 7.0) or advanceExact (Lucene >= 7.0) before calling nextValue().
> >
> > Le lun. 19 févr. 2018 à 14:41, Vadim Gindin <[hidden email]> a
> > écrit :
> >
> > > Hi all
> > >
> > > I use DocValue for scoring function. I.e. I have some column with
> > integers,
> > > that are used in scoring formula. So I have a scorer that calculates
> > > scoring function twice:
> > > - in score()
> > > - in explain()
> > >
> > > I got the following error in explain:
> > >
> > > Caused by: java.lang.IndexOutOfBoundsException
> > >         at java.nio.Buffer.checkIndex(Buffer.java:540) ~[?:1.8.0_161]
> > >         at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:253)
> > > ~[?:1.8.0_161]
> > >         at
> > > org.apache.lucene.store.ByteBufferGuard.getByte(
> > ByteBufferGuard.java:118)
> > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659
> -
> > > ubuntu - 2017-10-13 16:12:42]
> > >         at
> > >
> > > org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(
> > ByteBufferIndexInput.java:385)
> > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659
> -
> > > ubuntu - 2017-10-13 16:12:42]
> > >         at
> > >
> > > org.apache.lucene.util.packed.DirectReader$DirectPackedReader8.get(
> > DirectReader.java:145)
> > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659
> -
> > > ubuntu - 2017-10-13 16:12:42]
> > >         at
> > >
> > >
> org.apache.lucene.codecs.lucene70.Lucene70DocValuesProducer$3.longValue(
> > Lucene70DocValuesProducer.java:481)
> > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659
> -
> > > ubuntu - 2017-10-13 16:12:42]
> > >         at
> > >
> > > org.apache.lucene.index.SingletonSortedNumericDocValues.nextValue(
> > SingletonSortedNumericDocValues.java:73)
> > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659
> -
> > > ubuntu - 2017-10-13 16:12:42]
> > >
> > > I've found the following comment in the source code of
> > > SortedNumericDocValues.java:
> > >
> > > /**
> > >  * Iterates to the next value in the current document.  Do not call
> > > this more than {@link #docValueCount} times
> > >  * for the document.
> > >  */
> > >
> > > public abstract long nextValue() throws IOException;
> > >
> > >
> > > Questions:
> > > 1) Why I can't read the values twice?
> > > 2) How can I manage this situation?
> > > 3) Can it work for NumericDocValues?
> > >
> > > Regards,
> > > Vadim Gindin
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Vadim Gindin
I have the scorer that is similar to DisjunctionScorer.java with

private final DisiPriorityQueue subScorers;
private final DisjunctionDISIApproximation approximation;

They are initialized in a constructor like that:

   this.subScorers = new DisiPriorityQueue(subScorers.size());
   for (Scorer scorer : subScorers) {
       final DisiWrapper w = new DisiWrapper(scorer);
       this.subScorers.add(w);
   }
   this.approximation = new DisjunctionDISIApproximation(this.subScorers);



I use them in score() and in explain(). In explain() I do

   this.approximation.advance(doc);

And further the same code as in score(). I've also added logging. And
here is the one string:

explain: doc=2147483647, field=params, maxDoc=67649

doc looks not so good..


On Mon, Feb 19, 2018 at 7:32 PM, Adrien Grand <[hidden email]> wrote:

> Can you add some debug logging to see what the values of topList.doc and
> reader.maxDoc() are before before you call advanceExact?
>
> What do you mean by "I reuse the same DisiPriorityQueue of scorers in
> score() and explain()". This shouldn't be possible.
>
> Le lun. 19 févr. 2018 à 15:23, Vadim Gindin <[hidden email]> a
> écrit :
>
> > I use these calls in both cases. In score() and explain() I have the
> > following code:
> >
> > SortedNumericDocValues numDocVal = DocValues.getSortedNumeric(reader,
> > fieldName);
> > if (numDocVal != null && numDocVal.advanceExact(topList.doc)) {
> >     long val = numDocVal.nextValue();
> >
> >     ..
> > }
> >
> > I reuse the same DisiPriorityQueue of scorers in score() and explain().
> >
> > On Mon, Feb 19, 2018 at 6:54 PM, Adrien Grand <[hidden email]> wrote:
> >
> > > If you want to read the values again, you need to call setDocument
> > (Lucene
> > > < 7.0) or advanceExact (Lucene >= 7.0) before calling nextValue().
> > >
> > > Le lun. 19 févr. 2018 à 14:41, Vadim Gindin <[hidden email]> a
> > > écrit :
> > >
> > > > Hi all
> > > >
> > > > I use DocValue for scoring function. I.e. I have some column with
> > > integers,
> > > > that are used in scoring formula. So I have a scorer that calculates
> > > > scoring function twice:
> > > > - in score()
> > > > - in explain()
> > > >
> > > > I got the following error in explain:
> > > >
> > > > Caused by: java.lang.IndexOutOfBoundsException
> > > >         at java.nio.Buffer.checkIndex(Buffer.java:540)
> ~[?:1.8.0_161]
> > > >         at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:253)
> > > > ~[?:1.8.0_161]
> > > >         at
> > > > org.apache.lucene.store.ByteBufferGuard.getByte(
> > > ByteBufferGuard.java:118)
> > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> 8a38550659
> > -
> > > > ubuntu - 2017-10-13 16:12:42]
> > > >         at
> > > >
> > > > org.apache.lucene.store.ByteBufferIndexInput$
> SingleBufferImpl.readByte(
> > > ByteBufferIndexInput.java:385)
> > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> 8a38550659
> > -
> > > > ubuntu - 2017-10-13 16:12:42]
> > > >         at
> > > >
> > > > org.apache.lucene.util.packed.DirectReader$DirectPackedReader8.get(
> > > DirectReader.java:145)
> > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> 8a38550659
> > -
> > > > ubuntu - 2017-10-13 16:12:42]
> > > >         at
> > > >
> > > >
> > org.apache.lucene.codecs.lucene70.Lucene70DocValuesProducer$3.longValue(
> > > Lucene70DocValuesProducer.java:481)
> > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> 8a38550659
> > -
> > > > ubuntu - 2017-10-13 16:12:42]
> > > >         at
> > > >
> > > > org.apache.lucene.index.SingletonSortedNumericDocValues.nextValue(
> > > SingletonSortedNumericDocValues.java:73)
> > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> 8a38550659
> > -
> > > > ubuntu - 2017-10-13 16:12:42]
> > > >
> > > > I've found the following comment in the source code of
> > > > SortedNumericDocValues.java:
> > > >
> > > > /**
> > > >  * Iterates to the next value in the current document.  Do not call
> > > > this more than {@link #docValueCount} times
> > > >  * for the document.
> > > >  */
> > > >
> > > > public abstract long nextValue() throws IOException;
> > > >
> > > >
> > > > Questions:
> > > > 1) Why I can't read the values twice?
> > > > 2) How can I manage this situation?
> > > > 3) Can it work for NumericDocValues?
> > > >
> > > > Regards,
> > > > Vadim Gindin
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Adrien Grand
Yes, this is the problem. This doc ID is a special sentinel value that
means that the iterator is exhausted. I don't have enough context to know
what the exact problem is but there is a bug in your custom query.

Le lun. 19 févr. 2018 à 16:07, Vadim Gindin <[hidden email]> a écrit :

> I have the scorer that is similar to DisjunctionScorer.java with
>
> private final DisiPriorityQueue subScorers;
> private final DisjunctionDISIApproximation approximation;
>
> They are initialized in a constructor like that:
>
>    this.subScorers = new DisiPriorityQueue(subScorers.size());
>    for (Scorer scorer : subScorers) {
>        final DisiWrapper w = new DisiWrapper(scorer);
>        this.subScorers.add(w);
>    }
>    this.approximation = new DisjunctionDISIApproximation(this.subScorers);
>
>
>
> I use them in score() and in explain(). In explain() I do
>
>    this.approximation.advance(doc);
>
> And further the same code as in score(). I've also added logging. And
> here is the one string:
>
> explain: doc=2147483647 <(214)%20748-3647>, field=params, maxDoc=67649
>
> doc looks not so good..
>
>
> On Mon, Feb 19, 2018 at 7:32 PM, Adrien Grand <[hidden email]> wrote:
>
> > Can you add some debug logging to see what the values of topList.doc and
> > reader.maxDoc() are before before you call advanceExact?
> >
> > What do you mean by "I reuse the same DisiPriorityQueue of scorers in
> > score() and explain()". This shouldn't be possible.
> >
> > Le lun. 19 févr. 2018 à 15:23, Vadim Gindin <[hidden email]> a
> > écrit :
> >
> > > I use these calls in both cases. In score() and explain() I have the
> > > following code:
> > >
> > > SortedNumericDocValues numDocVal = DocValues.getSortedNumeric(reader,
> > > fieldName);
> > > if (numDocVal != null && numDocVal.advanceExact(topList.doc)) {
> > >     long val = numDocVal.nextValue();
> > >
> > >     ..
> > > }
> > >
> > > I reuse the same DisiPriorityQueue of scorers in score() and explain().
> > >
> > > On Mon, Feb 19, 2018 at 6:54 PM, Adrien Grand <[hidden email]>
> wrote:
> > >
> > > > If you want to read the values again, you need to call setDocument
> > > (Lucene
> > > > < 7.0) or advanceExact (Lucene >= 7.0) before calling nextValue().
> > > >
> > > > Le lun. 19 févr. 2018 à 14:41, Vadim Gindin <[hidden email]> a
> > > > écrit :
> > > >
> > > > > Hi all
> > > > >
> > > > > I use DocValue for scoring function. I.e. I have some column with
> > > > integers,
> > > > > that are used in scoring formula. So I have a scorer that
> calculates
> > > > > scoring function twice:
> > > > > - in score()
> > > > > - in explain()
> > > > >
> > > > > I got the following error in explain:
> > > > >
> > > > > Caused by: java.lang.IndexOutOfBoundsException
> > > > >         at java.nio.Buffer.checkIndex(Buffer.java:540)
> > ~[?:1.8.0_161]
> > > > >         at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:253)
> > > > > ~[?:1.8.0_161]
> > > > >         at
> > > > > org.apache.lucene.store.ByteBufferGuard.getByte(
> > > > ByteBufferGuard.java:118)
> > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > 8a38550659
> > > -
> > > > > ubuntu - 2017-10-13 16:12:42]
> > > > >         at
> > > > >
> > > > > org.apache.lucene.store.ByteBufferIndexInput$
> > SingleBufferImpl.readByte(
> > > > ByteBufferIndexInput.java:385)
> > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > 8a38550659
> > > -
> > > > > ubuntu - 2017-10-13 16:12:42]
> > > > >         at
> > > > >
> > > > > org.apache.lucene.util.packed.DirectReader$DirectPackedReader8.get(
> > > > DirectReader.java:145)
> > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > 8a38550659
> > > -
> > > > > ubuntu - 2017-10-13 16:12:42]
> > > > >         at
> > > > >
> > > > >
> > >
> org.apache.lucene.codecs.lucene70.Lucene70DocValuesProducer$3.longValue(
> > > > Lucene70DocValuesProducer.java:481)
> > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > 8a38550659
> > > -
> > > > > ubuntu - 2017-10-13 16:12:42]
> > > > >         at
> > > > >
> > > > > org.apache.lucene.index.SingletonSortedNumericDocValues.nextValue(
> > > > SingletonSortedNumericDocValues.java:73)
> > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > 8a38550659
> > > -
> > > > > ubuntu - 2017-10-13 16:12:42]
> > > > >
> > > > > I've found the following comment in the source code of
> > > > > SortedNumericDocValues.java:
> > > > >
> > > > > /**
> > > > >  * Iterates to the next value in the current document.  Do not call
> > > > > this more than {@link #docValueCount} times
> > > > >  * for the document.
> > > > >  */
> > > > >
> > > > > public abstract long nextValue() throws IOException;
> > > > >
> > > > >
> > > > > Questions:
> > > > > 1) Why I can't read the values twice?
> > > > > 2) How can I manage this situation?
> > > > > 3) Can it work for NumericDocValues?
> > > > >
> > > > > Regards,
> > > > > Vadim Gindin
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Vadim Gindin
Adrien, would you be so kind to look at my Query/Weight/Scorer. I'm attaching those files containing them. I've removed unnecessary code like initialization, toString, equals and so on. Finally, how would iterator work correctly - it would be allowed to navigate through it twice - in score() and in explanation(). Isn't it?

Regards, 
Vadim Gindin

On Mon, Feb 19, 2018 at 10:55 PM, Adrien Grand <[hidden email]> wrote:
Yes, this is the problem. This doc ID is a special sentinel value that
means that the iterator is exhausted. I don't have enough context to know
what the exact problem is but there is a bug in your custom query.

Le lun. 19 févr. 2018 à 16:07, Vadim Gindin <[hidden email]> a écrit :

> I have the scorer that is similar to DisjunctionScorer.java with
>
> private final DisiPriorityQueue subScorers;
> private final DisjunctionDISIApproximation approximation;
>
> They are initialized in a constructor like that:
>
>    this.subScorers = new DisiPriorityQueue(subScorers.size());
>    for (Scorer scorer : subScorers) {
>        final DisiWrapper w = new DisiWrapper(scorer);
>        this.subScorers.add(w);
>    }
>    this.approximation = new DisjunctionDISIApproximation(this.subScorers);
>
>
>
> I use them in score() and in explain(). In explain() I do
>
>    this.approximation.advance(doc);
>
> And further the same code as in score(). I've also added logging. And
> here is the one string:
>
> explain: doc=2147483647 <(214)%20748-3647>, field=params, maxDoc=67649
>
> doc looks not so good..
>
>
> On Mon, Feb 19, 2018 at 7:32 PM, Adrien Grand <[hidden email]> wrote:
>
> > Can you add some debug logging to see what the values of topList.doc and
> > reader.maxDoc() are before before you call advanceExact?
> >
> > What do you mean by "I reuse the same DisiPriorityQueue of scorers in
> > score() and explain()". This shouldn't be possible.
> >
> > Le lun. 19 févr. 2018 à 15:23, Vadim Gindin <[hidden email]> a
> > écrit :
> >
> > > I use these calls in both cases. In score() and explain() I have the
> > > following code:
> > >
> > > SortedNumericDocValues numDocVal = DocValues.getSortedNumeric(reader,
> > > fieldName);
> > > if (numDocVal != null && numDocVal.advanceExact(topList.doc)) {
> > >     long val = numDocVal.nextValue();
> > >
> > >     ..
> > > }
> > >
> > > I reuse the same DisiPriorityQueue of scorers in score() and explain().
> > >
> > > On Mon, Feb 19, 2018 at 6:54 PM, Adrien Grand <[hidden email]>
> wrote:
> > >
> > > > If you want to read the values again, you need to call setDocument
> > > (Lucene
> > > > < 7.0) or advanceExact (Lucene >= 7.0) before calling nextValue().
> > > >
> > > > Le lun. 19 févr. 2018 à 14:41, Vadim Gindin <[hidden email]> a
> > > > écrit :
> > > >
> > > > > Hi all
> > > > >
> > > > > I use DocValue for scoring function. I.e. I have some column with
> > > > integers,
> > > > > that are used in scoring formula. So I have a scorer that
> calculates
> > > > > scoring function twice:
> > > > > - in score()
> > > > > - in explain()
> > > > >
> > > > > I got the following error in explain:
> > > > >
> > > > > Caused by: java.lang.IndexOutOfBoundsException
> > > > >         at java.nio.Buffer.checkIndex(Buffer.java:540)
> > ~[?:1.8.0_161]
> > > > >         at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:253)
> > > > > ~[?:1.8.0_161]
> > > > >         at
> > > > > org.apache.lucene.store.ByteBufferGuard.getByte(
> > > > ByteBufferGuard.java:118)
> > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > 8a38550659
> > > -
> > > > > ubuntu - 2017-10-13 16:12:42]
> > > > >         at
> > > > >
> > > > > org.apache.lucene.store.ByteBufferIndexInput$
> > SingleBufferImpl.readByte(
> > > > ByteBufferIndexInput.java:385)
> > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > 8a38550659
> > > -
> > > > > ubuntu - 2017-10-13 16:12:42]
> > > > >         at
> > > > >
> > > > > org.apache.lucene.util.packed.DirectReader$DirectPackedReader8.get(
> > > > DirectReader.java:145)
> > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > 8a38550659
> > > -
> > > > > ubuntu - 2017-10-13 16:12:42]
> > > > >         at
> > > > >
> > > > >
> > >
> org.apache.lucene.codecs.lucene70.Lucene70DocValuesProducer$3.longValue(
> > > > Lucene70DocValuesProducer.java:481)
> > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > 8a38550659
> > > -
> > > > > ubuntu - 2017-10-13 16:12:42]
> > > > >         at
> > > > >
> > > > > org.apache.lucene.index.SingletonSortedNumericDocValues.nextValue(
> > > > SingletonSortedNumericDocValues.java:73)
> > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > 8a38550659
> > > -
> > > > > ubuntu - 2017-10-13 16:12:42]
> > > > >
> > > > > I've found the following comment in the source code of
> > > > > SortedNumericDocValues.java:
> > > > >
> > > > > /**
> > > > >  * Iterates to the next value in the current document.  Do not call
> > > > > this more than {@link #docValueCount} times
> > > > >  * for the document.
> > > > >  */
> > > > >
> > > > > public abstract long nextValue() throws IOException;
> > > > >
> > > > >
> > > > > Questions:
> > > > > 1) Why I can't read the values twice?
> > > > > 2) How can I manage this situation?
> > > > > 3) Can it work for NumericDocValues?
> > > > >
> > > > > Regards,
> > > > > Vadim Gindin
> > > > >
> > > >
> > >
> >
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Adrien Grand
I can't see any attachements, did you forget to attach these files?

Le mar. 20 févr. 2018 à 11:30, Vadim Gindin <[hidden email]> a écrit :

> Adrien, would you be so kind to look at my Query/Weight/Scorer. I'm
> attaching those files containing them. I've removed unnecessary code like
> initialization, toString, equals and so on. Finally, how would iterator
> work correctly - it would be allowed to navigate through it twice - in
> score() and in explanation(). Isn't it?
>
> Regards,
> Vadim Gindin
>
> On Mon, Feb 19, 2018 at 10:55 PM, Adrien Grand <[hidden email]> wrote:
>
>> Yes, this is the problem. This doc ID is a special sentinel value that
>> means that the iterator is exhausted. I don't have enough context to know
>> what the exact problem is but there is a bug in your custom query.
>>
>> Le lun. 19 févr. 2018 à 16:07, Vadim Gindin <[hidden email]> a
>> écrit :
>>
>> > I have the scorer that is similar to DisjunctionScorer.java with
>> >
>> > private final DisiPriorityQueue subScorers;
>> > private final DisjunctionDISIApproximation approximation;
>> >
>> > They are initialized in a constructor like that:
>> >
>> >    this.subScorers = new DisiPriorityQueue(subScorers.size());
>> >    for (Scorer scorer : subScorers) {
>> >        final DisiWrapper w = new DisiWrapper(scorer);
>> >        this.subScorers.add(w);
>> >    }
>> >    this.approximation = new
>> DisjunctionDISIApproximation(this.subScorers);
>> >
>> >
>> >
>> > I use them in score() and in explain(). In explain() I do
>> >
>> >    this.approximation.advance(doc);
>> >
>> > And further the same code as in score(). I've also added logging. And
>> > here is the one string:
>> >
>>
> > explain: doc=2147483647 <(214)%20748-3647> <(214)%20748-3647>,
>> field=params, maxDoc=67649
>
>
>> >
>> > doc looks not so good..
>> >
>> >
>> > On Mon, Feb 19, 2018 at 7:32 PM, Adrien Grand <[hidden email]>
>> wrote:
>> >
>> > > Can you add some debug logging to see what the values of topList.doc
>> and
>> > > reader.maxDoc() are before before you call advanceExact?
>> > >
>> > > What do you mean by "I reuse the same DisiPriorityQueue of scorers in
>> > > score() and explain()". This shouldn't be possible.
>> > >
>> > > Le lun. 19 févr. 2018 à 15:23, Vadim Gindin <[hidden email]> a
>> > > écrit :
>> > >
>> > > > I use these calls in both cases. In score() and explain() I have the
>> > > > following code:
>> > > >
>> > > > SortedNumericDocValues numDocVal =
>> DocValues.getSortedNumeric(reader,
>> > > > fieldName);
>> > > > if (numDocVal != null && numDocVal.advanceExact(topList.doc)) {
>> > > >     long val = numDocVal.nextValue();
>> > > >
>> > > >     ..
>> > > > }
>> > > >
>> > > > I reuse the same DisiPriorityQueue of scorers in score() and
>> explain().
>> > > >
>> > > > On Mon, Feb 19, 2018 at 6:54 PM, Adrien Grand <[hidden email]>
>> > wrote:
>> > > >
>> > > > > If you want to read the values again, you need to call setDocument
>> > > > (Lucene
>> > > > > < 7.0) or advanceExact (Lucene >= 7.0) before calling nextValue().
>> > > > >
>> > > > > Le lun. 19 févr. 2018 à 14:41, Vadim Gindin <[hidden email]>
>> a
>> > > > > écrit :
>> > > > >
>> > > > > > Hi all
>> > > > > >
>> > > > > > I use DocValue for scoring function. I.e. I have some column
>> with
>> > > > > integers,
>> > > > > > that are used in scoring formula. So I have a scorer that
>> > calculates
>> > > > > > scoring function twice:
>> > > > > > - in score()
>> > > > > > - in explain()
>> > > > > >
>> > > > > > I got the following error in explain:
>> > > > > >
>> > > > > > Caused by: java.lang.IndexOutOfBoundsException
>> > > > > >         at java.nio.Buffer.checkIndex(Buffer.java:540)
>> > > ~[?:1.8.0_161]
>> > > > > >         at
>> java.nio.DirectByteBuffer.get(DirectByteBuffer.java:253)
>> > > > > > ~[?:1.8.0_161]
>> > > > > >         at
>> > > > > > org.apache.lucene.store.ByteBufferGuard.getByte(
>> > > > > ByteBufferGuard.java:118)
>> > > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
>> > > 8a38550659
>> > > > -
>> > > > > > ubuntu - 2017-10-13 16:12:42]
>> > > > > >         at
>> > > > > >
>> > > > > > org.apache.lucene.store.ByteBufferIndexInput$
>> > > SingleBufferImpl.readByte(
>> > > > > ByteBufferIndexInput.java:385)
>> > > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
>> > > 8a38550659
>> > > > -
>> > > > > > ubuntu - 2017-10-13 16:12:42]
>> > > > > >         at
>> > > > > >
>> > > > > >
>> org.apache.lucene.util.packed.DirectReader$DirectPackedReader8.get(
>> > > > > DirectReader.java:145)
>> > > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
>> > > 8a38550659
>> > > > -
>> > > > > > ubuntu - 2017-10-13 16:12:42]
>> > > > > >         at
>> > > > > >
>> > > > > >
>> > > >
>> > org.apache.lucene.codecs.lucene70.Lucene70DocValuesProducer$3.longValue(
>> > > > > Lucene70DocValuesProducer.java:481)
>> > > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
>> > > 8a38550659
>> > > > -
>> > > > > > ubuntu - 2017-10-13 16:12:42]
>> > > > > >         at
>> > > > > >
>> > > > > >
>> org.apache.lucene.index.SingletonSortedNumericDocValues.nextValue(
>> > > > > SingletonSortedNumericDocValues.java:73)
>> > > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
>> > > 8a38550659
>> > > > -
>> > > > > > ubuntu - 2017-10-13 16:12:42]
>> > > > > >
>> > > > > > I've found the following comment in the source code of
>> > > > > > SortedNumericDocValues.java:
>> > > > > >
>> > > > > > /**
>> > > > > >  * Iterates to the next value in the current document.  Do not
>> call
>> > > > > > this more than {@link #docValueCount} times
>> > > > > >  * for the document.
>> > > > > >  */
>> > > > > >
>> > > > > > public abstract long nextValue() throws IOException;
>> > > > > >
>> > > > > >
>> > > > > > Questions:
>> > > > > > 1) Why I can't read the values twice?
>> > > > > > 2) How can I manage this situation?
>> > > > > > 3) Can it work for NumericDocValues?
>> > > > > >
>> > > > > > Regards,
>> > > > > > Vadim Gindin
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Vadim Gindin
Probably it is not possible to attach files from email letter. Here they are:

ConstTermScorer.java
<http://lucene.472066.n3.nabble.com/file/t493564/ConstTermScorer.java>  
PrizeDisjunctionScorer.java
<http://lucene.472066.n3.nabble.com/file/t493564/PrizeDisjunctionScorer.java>  
PhraseQuery.java
<http://lucene.472066.n3.nabble.com/file/t493564/PhraseQuery.java>  



--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Vadim Gindin
The test gives the following error:

java.lang.AssertionError: Docs enums are only supposed to be consumed in
the thread in which they have been acquired. But was acquired in
Thread[elasticsearch[node_s2][search][T#4],5,TGRP-CustomQueryParserIT] and
consumed in
Thread[elasticsearch[node_s2][search][T#2],5,TGRP-CustomQueryParserIT].
at __randomizedtesting.SeedInfo.seed([935231818B6C9F26]:0)
at
org.apache.lucene.index.AssertingLeafReader.assertThread(AssertingLeafReader.java:42)
at
org.apache.lucene.index.AssertingLeafReader.access$000(AssertingLeafReader.java:36)
at
org.apache.lucene.index.AssertingLeafReader$AssertingPostingsEnum.advance(AssertingLeafReader.java:330)
at
org.apache.lucene.search.DisjunctionDISIApproximation.advance(DisjunctionDISIApproximation.java:66)
at
com.detectum.query.phrase.PrizeDisjunctionScorer.explain(PrizeDisjunctionScorer.java:220)

from explain() method.



On Tue, Feb 20, 2018 at 8:03 PM, Vadim Gindin <[hidden email]> wrote:

> Probably it is not possible to attach files from email letter. Here they
> are:
>
> ConstTermScorer.java
> <http://lucene.472066.n3.nabble.com/file/t493564/ConstTermScorer.java>
> PrizeDisjunctionScorer.java
> <http://lucene.472066.n3.nabble.com/file/t493564/
> PrizeDisjunctionScorer.java>
> PhraseQuery.java
> <http://lucene.472066.n3.nabble.com/file/t493564/PhraseQuery.java>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-
> f532864.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Adrien Grand
This might not solve all problems, but you should stop caching the weight
in the query and stop caching the scorer in the weight: just create a new
scorer in calls to explain().

Le mer. 21 févr. 2018 à 14:05, Vadim Gindin <[hidden email]> a écrit :

> The test gives the following error:
>
> java.lang.AssertionError: Docs enums are only supposed to be consumed in
> the thread in which they have been acquired. But was acquired in
> Thread[elasticsearch[node_s2][search][T#4],5,TGRP-CustomQueryParserIT] and
> consumed in
> Thread[elasticsearch[node_s2][search][T#2],5,TGRP-CustomQueryParserIT].
> at __randomizedtesting.SeedInfo.seed([935231818B6C9F26]:0)
> at
>
> org.apache.lucene.index.AssertingLeafReader.assertThread(AssertingLeafReader.java:42)
> at
>
> org.apache.lucene.index.AssertingLeafReader.access$000(AssertingLeafReader.java:36)
> at
>
> org.apache.lucene.index.AssertingLeafReader$AssertingPostingsEnum.advance(AssertingLeafReader.java:330)
> at
>
> org.apache.lucene.search.DisjunctionDISIApproximation.advance(DisjunctionDISIApproximation.java:66)
> at
>
> com.detectum.query.phrase.PrizeDisjunctionScorer.explain(PrizeDisjunctionScorer.java:220)
>
> from explain() method.
>
>
>
> On Tue, Feb 20, 2018 at 8:03 PM, Vadim Gindin <[hidden email]>
> wrote:
>
> > Probably it is not possible to attach files from email letter. Here they
> > are:
> >
> > ConstTermScorer.java
> > <http://lucene.472066.n3.nabble.com/file/t493564/ConstTermScorer.java>
> > PrizeDisjunctionScorer.java
> > <http://lucene.472066.n3.nabble.com/file/t493564/
> > PrizeDisjunctionScorer.java>
> > PhraseQuery.java
> > <http://lucene.472066.n3.nabble.com/file/t493564/PhraseQuery.java>
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-
> > f532864.html
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Vadim Gindin
Adrien, thank's a lot!  It looks like a working solution for my bugs. I
really appreciate it.

I just want to ask. Is it really effective way create a Scorer for every
document? Can we say, that it's designed for Scorer to be lightweight and
fast enough so?

On Wed, Feb 21, 2018 at 6:42 PM, Adrien Grand <[hidden email]> wrote:

> This might not solve all problems, but you should stop caching the weight
> in the query and stop caching the scorer in the weight: just create a new
> scorer in calls to explain().
>
> Le mer. 21 févr. 2018 à 14:05, Vadim Gindin <[hidden email]> a
> écrit :
>
> > The test gives the following error:
> >
> > java.lang.AssertionError: Docs enums are only supposed to be consumed in
> > the thread in which they have been acquired. But was acquired in
> > Thread[elasticsearch[node_s2][search][T#4],5,TGRP-CustomQueryParserIT]
> and
> > consumed in
> > Thread[elasticsearch[node_s2][search][T#2],5,TGRP-CustomQueryParserIT].
> > at __randomizedtesting.SeedInfo.seed([935231818B6C9F26]:0)
> > at
> >
> > org.apache.lucene.index.AssertingLeafReader.assertThread(
> AssertingLeafReader.java:42)
> > at
> >
> > org.apache.lucene.index.AssertingLeafReader.access$
> 000(AssertingLeafReader.java:36)
> > at
> >
> > org.apache.lucene.index.AssertingLeafReader$
> AssertingPostingsEnum.advance(AssertingLeafReader.java:330)
> > at
> >
> > org.apache.lucene.search.DisjunctionDISIApproximation.advance(
> DisjunctionDISIApproximation.java:66)
> > at
> >
> > com.detectum.query.phrase.PrizeDisjunctionScorer.explain(
> PrizeDisjunctionScorer.java:220)
> >
> > from explain() method.
> >
> >
> >
> > On Tue, Feb 20, 2018 at 8:03 PM, Vadim Gindin <[hidden email]>
> > wrote:
> >
> > > Probably it is not possible to attach files from email letter. Here
> they
> > > are:
> > >
> > > ConstTermScorer.java
> > > <http://lucene.472066.n3.nabble.com/file/t493564/ConstTermScorer.java>
> > > PrizeDisjunctionScorer.java
> > > <http://lucene.472066.n3.nabble.com/file/t493564/
> > > PrizeDisjunctionScorer.java>
> > > PhraseQuery.java
> > > <http://lucene.472066.n3.nabble.com/file/t493564/PhraseQuery.java>
> > >
> > >
> > >
> > > --
> > > Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-
> > > f532864.html
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Adrien Grand
If you are talking about explanations, then yes, it's fine. Explain() is
used for debugging, it is fine if it is slow. However Lucene creates only
one Scorer for all documents of an entire segment when it comes to actually
running a query.

Le jeu. 22 févr. 2018 à 07:06, Vadim Gindin <[hidden email]> a écrit :

> Adrien, thank's a lot!  It looks like a working solution for my bugs. I
> really appreciate it.
>
> I just want to ask. Is it really effective way create a Scorer for every
> document? Can we say, that it's designed for Scorer to be lightweight and
> fast enough so?
>
> On Wed, Feb 21, 2018 at 6:42 PM, Adrien Grand <[hidden email]> wrote:
>
> > This might not solve all problems, but you should stop caching the weight
> > in the query and stop caching the scorer in the weight: just create a new
> > scorer in calls to explain().
> >
> > Le mer. 21 févr. 2018 à 14:05, Vadim Gindin <[hidden email]> a
> > écrit :
> >
> > > The test gives the following error:
> > >
> > > java.lang.AssertionError: Docs enums are only supposed to be consumed
> in
> > > the thread in which they have been acquired. But was acquired in
> > > Thread[elasticsearch[node_s2][search][T#4],5,TGRP-CustomQueryParserIT]
> > and
> > > consumed in
> > > Thread[elasticsearch[node_s2][search][T#2],5,TGRP-CustomQueryParserIT].
> > > at __randomizedtesting.SeedInfo.seed([935231818B6C9F26]:0)
> > > at
> > >
> > > org.apache.lucene.index.AssertingLeafReader.assertThread(
> > AssertingLeafReader.java:42)
> > > at
> > >
> > > org.apache.lucene.index.AssertingLeafReader.access$
> > 000(AssertingLeafReader.java:36)
> > > at
> > >
> > > org.apache.lucene.index.AssertingLeafReader$
> > AssertingPostingsEnum.advance(AssertingLeafReader.java:330)
> > > at
> > >
> > > org.apache.lucene.search.DisjunctionDISIApproximation.advance(
> > DisjunctionDISIApproximation.java:66)
> > > at
> > >
> > > com.detectum.query.phrase.PrizeDisjunctionScorer.explain(
> > PrizeDisjunctionScorer.java:220)
> > >
> > > from explain() method.
> > >
> > >
> > >
> > > On Tue, Feb 20, 2018 at 8:03 PM, Vadim Gindin <[hidden email]>
> > > wrote:
> > >
> > > > Probably it is not possible to attach files from email letter. Here
> > they
> > > > are:
> > > >
> > > > ConstTermScorer.java
> > > > <
> http://lucene.472066.n3.nabble.com/file/t493564/ConstTermScorer.java>
> > > > PrizeDisjunctionScorer.java
> > > > <http://lucene.472066.n3.nabble.com/file/t493564/
> > > > PrizeDisjunctionScorer.java>
> > > > PhraseQuery.java
> > > > <http://lucene.472066.n3.nabble.com/file/t493564/PhraseQuery.java>
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-
> > > > f532864.html
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [hidden email]
> > > > For additional commands, e-mail: [hidden email]
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Vadim Gindin
I'd like to use "explain" mechanism to output some additional match
information: scoring formula, detailed matching information and so on. But
now it seems, "explain" works slower even than just logging of matching
information to a file from score() method.

- What is the most effective way to do this? Is there a possibility to
accelerate "explain", for example with scorer caching?
- Lucene uses the only Scorer (for entire segment) for calling score()
method. What about explain()?
- Iterators are really - readable-once only?

Regards,
Vadim Gindin

On Thu, Feb 22, 2018 at 3:03 PM, Adrien Grand <[hidden email]> wrote:

> If you are talking about explanations, then yes, it's fine. Explain() is
> used for debugging, it is fine if it is slow. However Lucene creates only
> one Scorer for all documents of an entire segment when it comes to actually
> running a query.
>
> Le jeu. 22 févr. 2018 à 07:06, Vadim Gindin <[hidden email]> a
> écrit :
>
> > Adrien, thank's a lot!  It looks like a working solution for my bugs. I
> > really appreciate it.
> >
> > I just want to ask. Is it really effective way create a Scorer for every
> > document? Can we say, that it's designed for Scorer to be lightweight and
> > fast enough so?
> >
> > On Wed, Feb 21, 2018 at 6:42 PM, Adrien Grand <[hidden email]> wrote:
> >
> > > This might not solve all problems, but you should stop caching the
> weight
> > > in the query and stop caching the scorer in the weight: just create a
> new
> > > scorer in calls to explain().
> > >
> > > Le mer. 21 févr. 2018 à 14:05, Vadim Gindin <[hidden email]> a
> > > écrit :
> > >
> > > > The test gives the following error:
> > > >
> > > > java.lang.AssertionError: Docs enums are only supposed to be consumed
> > in
> > > > the thread in which they have been acquired. But was acquired in
> > > > Thread[elasticsearch[node_s2][search][T#4],5,TGRP-
> CustomQueryParserIT]
> > > and
> > > > consumed in
> > > > Thread[elasticsearch[node_s2][search][T#2],5,TGRP-
> CustomQueryParserIT].
> > > > at __randomizedtesting.SeedInfo.seed([935231818B6C9F26]:0)
> > > > at
> > > >
> > > > org.apache.lucene.index.AssertingLeafReader.assertThread(
> > > AssertingLeafReader.java:42)
> > > > at
> > > >
> > > > org.apache.lucene.index.AssertingLeafReader.access$
> > > 000(AssertingLeafReader.java:36)
> > > > at
> > > >
> > > > org.apache.lucene.index.AssertingLeafReader$
> > > AssertingPostingsEnum.advance(AssertingLeafReader.java:330)
> > > > at
> > > >
> > > > org.apache.lucene.search.DisjunctionDISIApproximation.advance(
> > > DisjunctionDISIApproximation.java:66)
> > > > at
> > > >
> > > > com.detectum.query.phrase.PrizeDisjunctionScorer.explain(
> > > PrizeDisjunctionScorer.java:220)
> > > >
> > > > from explain() method.
> > > >
> > > >
> > > >
> > > > On Tue, Feb 20, 2018 at 8:03 PM, Vadim Gindin <[hidden email]>
> > > > wrote:
> > > >
> > > > > Probably it is not possible to attach files from email letter. Here
> > > they
> > > > > are:
> > > > >
> > > > > ConstTermScorer.java
> > > > > <
> > http://lucene.472066.n3.nabble.com/file/t493564/ConstTermScorer.java>
> > > > > PrizeDisjunctionScorer.java
> > > > > <http://lucene.472066.n3.nabble.com/file/t493564/
> > > > > PrizeDisjunctionScorer.java>
> > > > > PhraseQuery.java
> > > > > <http://lucene.472066.n3.nabble.com/file/t493564/PhraseQuery.java>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-
> > > > > f532864.html
> > > > >
> > > > > ------------------------------------------------------------
> ---------
> > > > > To unsubscribe, e-mail: [hidden email]
> > > > > For additional commands, e-mail: [hidden email]
> > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Adrien Grand
You are trying to use explain for more than what it has been designed for.
Calling explain on the top hits is fine, but it seems that you need/want to
do this for all matches. We don't have a solution for this.

Caching the scorer doesn't work since scorers can only be iterated once.

Le jeu. 22 févr. 2018 à 12:11, Vadim Gindin <[hidden email]> a écrit :

> I'd like to use "explain" mechanism to output some additional match
> information: scoring formula, detailed matching information and so on. But
> now it seems, "explain" works slower even than just logging of matching
> information to a file from score() method.
>
> - What is the most effective way to do this? Is there a possibility to
> accelerate "explain", for example with scorer caching?
> - Lucene uses the only Scorer (for entire segment) for calling score()
> method. What about explain()?
> - Iterators are really - readable-once only?
>
> Regards,
> Vadim Gindin
>
> On Thu, Feb 22, 2018 at 3:03 PM, Adrien Grand <[hidden email]> wrote:
>
> > If you are talking about explanations, then yes, it's fine. Explain() is
> > used for debugging, it is fine if it is slow. However Lucene creates only
> > one Scorer for all documents of an entire segment when it comes to
> actually
> > running a query.
> >
> > Le jeu. 22 févr. 2018 à 07:06, Vadim Gindin <[hidden email]> a
> > écrit :
> >
> > > Adrien, thank's a lot!  It looks like a working solution for my bugs. I
> > > really appreciate it.
> > >
> > > I just want to ask. Is it really effective way create a Scorer for
> every
> > > document? Can we say, that it's designed for Scorer to be lightweight
> and
> > > fast enough so?
> > >
> > > On Wed, Feb 21, 2018 at 6:42 PM, Adrien Grand <[hidden email]>
> wrote:
> > >
> > > > This might not solve all problems, but you should stop caching the
> > weight
> > > > in the query and stop caching the scorer in the weight: just create a
> > new
> > > > scorer in calls to explain().
> > > >
> > > > Le mer. 21 févr. 2018 à 14:05, Vadim Gindin <[hidden email]> a
> > > > écrit :
> > > >
> > > > > The test gives the following error:
> > > > >
> > > > > java.lang.AssertionError: Docs enums are only supposed to be
> consumed
> > > in
> > > > > the thread in which they have been acquired. But was acquired in
> > > > > Thread[elasticsearch[node_s2][search][T#4],5,TGRP-
> > CustomQueryParserIT]
> > > > and
> > > > > consumed in
> > > > > Thread[elasticsearch[node_s2][search][T#2],5,TGRP-
> > CustomQueryParserIT].
> > > > > at __randomizedtesting.SeedInfo.seed([935231818B6C9F26]:0)
> > > > > at
> > > > >
> > > > > org.apache.lucene.index.AssertingLeafReader.assertThread(
> > > > AssertingLeafReader.java:42)
> > > > > at
> > > > >
> > > > > org.apache.lucene.index.AssertingLeafReader.access$
> > > > 000(AssertingLeafReader.java:36)
> > > > > at
> > > > >
> > > > > org.apache.lucene.index.AssertingLeafReader$
> > > > AssertingPostingsEnum.advance(AssertingLeafReader.java:330)
> > > > > at
> > > > >
> > > > > org.apache.lucene.search.DisjunctionDISIApproximation.advance(
> > > > DisjunctionDISIApproximation.java:66)
> > > > > at
> > > > >
> > > > > com.detectum.query.phrase.PrizeDisjunctionScorer.explain(
> > > > PrizeDisjunctionScorer.java:220)
> > > > >
> > > > > from explain() method.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Feb 20, 2018 at 8:03 PM, Vadim Gindin <
> [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Probably it is not possible to attach files from email letter.
> Here
> > > > they
> > > > > > are:
> > > > > >
> > > > > > ConstTermScorer.java
> > > > > > <
> > > http://lucene.472066.n3.nabble.com/file/t493564/ConstTermScorer.java>
> > > > > > PrizeDisjunctionScorer.java
> > > > > > <http://lucene.472066.n3.nabble.com/file/t493564/
> > > > > > PrizeDisjunctionScorer.java>
> > > > > > PhraseQuery.java
> > > > > > <
> http://lucene.472066.n3.nabble.com/file/t493564/PhraseQuery.java>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-
> > > > > > f532864.html
> > > > > >
> > > > > > ------------------------------------------------------------
> > ---------
> > > > > > To unsubscribe, e-mail: [hidden email]
> > > > > > For additional commands, e-mail:
> [hidden email]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Read DocValue twice

Vadim Gindin
Thank you very very much for your time and efforts

On Thu, Feb 22, 2018 at 6:23 PM, Adrien Grand <[hidden email]> wrote:

> You are trying to use explain for more than what it has been designed for.
> Calling explain on the top hits is fine, but it seems that you need/want to
> do this for all matches. We don't have a solution for this.
>
> Caching the scorer doesn't work since scorers can only be iterated once.
>
> Le jeu. 22 févr. 2018 à 12:11, Vadim Gindin <[hidden email]> a
> écrit :
>
> > I'd like to use "explain" mechanism to output some additional match
> > information: scoring formula, detailed matching information and so on.
> But
> > now it seems, "explain" works slower even than just logging of matching
> > information to a file from score() method.
> >
> > - What is the most effective way to do this? Is there a possibility to
> > accelerate "explain", for example with scorer caching?
> > - Lucene uses the only Scorer (for entire segment) for calling score()
> > method. What about explain()?
> > - Iterators are really - readable-once only?
> >
> > Regards,
> > Vadim Gindin
> >
> > On Thu, Feb 22, 2018 at 3:03 PM, Adrien Grand <[hidden email]> wrote:
> >
> > > If you are talking about explanations, then yes, it's fine. Explain()
> is
> > > used for debugging, it is fine if it is slow. However Lucene creates
> only
> > > one Scorer for all documents of an entire segment when it comes to
> > actually
> > > running a query.
> > >
> > > Le jeu. 22 févr. 2018 à 07:06, Vadim Gindin <[hidden email]> a
> > > écrit :
> > >
> > > > Adrien, thank's a lot!  It looks like a working solution for my
> bugs. I
> > > > really appreciate it.
> > > >
> > > > I just want to ask. Is it really effective way create a Scorer for
> > every
> > > > document? Can we say, that it's designed for Scorer to be lightweight
> > and
> > > > fast enough so?
> > > >
> > > > On Wed, Feb 21, 2018 at 6:42 PM, Adrien Grand <[hidden email]>
> > wrote:
> > > >
> > > > > This might not solve all problems, but you should stop caching the
> > > weight
> > > > > in the query and stop caching the scorer in the weight: just
> create a
> > > new
> > > > > scorer in calls to explain().
> > > > >
> > > > > Le mer. 21 févr. 2018 à 14:05, Vadim Gindin <[hidden email]>
> a
> > > > > écrit :
> > > > >
> > > > > > The test gives the following error:
> > > > > >
> > > > > > java.lang.AssertionError: Docs enums are only supposed to be
> > consumed
> > > > in
> > > > > > the thread in which they have been acquired. But was acquired in
> > > > > > Thread[elasticsearch[node_s2][search][T#4],5,TGRP-
> > > CustomQueryParserIT]
> > > > > and
> > > > > > consumed in
> > > > > > Thread[elasticsearch[node_s2][search][T#2],5,TGRP-
> > > CustomQueryParserIT].
> > > > > > at __randomizedtesting.SeedInfo.seed([935231818B6C9F26]:0)
> > > > > > at
> > > > > >
> > > > > > org.apache.lucene.index.AssertingLeafReader.assertThread(
> > > > > AssertingLeafReader.java:42)
> > > > > > at
> > > > > >
> > > > > > org.apache.lucene.index.AssertingLeafReader.access$
> > > > > 000(AssertingLeafReader.java:36)
> > > > > > at
> > > > > >
> > > > > > org.apache.lucene.index.AssertingLeafReader$
> > > > > AssertingPostingsEnum.advance(AssertingLeafReader.java:330)
> > > > > > at
> > > > > >
> > > > > > org.apache.lucene.search.DisjunctionDISIApproximation.advance(
> > > > > DisjunctionDISIApproximation.java:66)
> > > > > > at
> > > > > >
> > > > > > com.detectum.query.phrase.PrizeDisjunctionScorer.explain(
> > > > > PrizeDisjunctionScorer.java:220)
> > > > > >
> > > > > > from explain() method.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Feb 20, 2018 at 8:03 PM, Vadim Gindin <
> > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Probably it is not possible to attach files from email letter.
> > Here
> > > > > they
> > > > > > > are:
> > > > > > >
> > > > > > > ConstTermScorer.java
> > > > > > > <
> > > > http://lucene.472066.n3.nabble.com/file/t493564/ConstTermScorer.java
> >
> > > > > > > PrizeDisjunctionScorer.java
> > > > > > > <http://lucene.472066.n3.nabble.com/file/t493564/
> > > > > > > PrizeDisjunctionScorer.java>
> > > > > > > PhraseQuery.java
> > > > > > > <
> > http://lucene.472066.n3.nabble.com/file/t493564/PhraseQuery.java>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sent from: http://lucene.472066.n3.
> nabble.com/Lucene-Java-Users-
> > > > > > > f532864.html
> > > > > > >
> > > > > > > ------------------------------------------------------------
> > > ---------
> > > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.
> apache.org
> > > > > > > For additional commands, e-mail:
> > [hidden email]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>