Understanding lucene unique fields

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Understanding lucene unique fields

Joan LLuís Planas Papió
Hello,

I'm trying to understand why i'm getting duplicated results in the attached java code. 
Debugging the code it seems that they come from differents segments. Increasing the RAMBufferSizeMB to a level that all the docs are in the same segment seems to return only unique numbers.

There is any way to get unique documents in a bulk query without having to cache then in a memory structure?

Thanks in advance!!


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

NumbersExample.java (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Understanding lucene unique fields

Thomas Matthijs-2
The doc id not global:

new SimpleCollector() {
    private LeafReaderContext context;
    @Override
    public void collect(final int doc) throws IOException {
        // ids.add(indexSearcher.doc(doc).getField(ID_FIELD_NAME).stringValue());

        ids.add(context.reader().document(doc).getField(ID_FIELD_NAME).stringValue());
        // OR
        ids.add(indexSearcher.doc(context.docBase +
doc).getField(ID_FIELD_NAME).stringValue());
    }
    @Override
    protected void doSetNextReader(LeafReaderContext context) throws
IOException {
        this.context = context;
    }
    @Override
    public ScoreMode scoreMode() {
        return ScoreMode.TOP_SCORES;
    }
}

On Tue, 26 Nov 2019 at 14:00, Joan LLuís Planas Papió
<[hidden email]> wrote:

>
> Hello,
>
> I'm trying to understand why i'm getting duplicated results in the attached java code.
> Debugging the code it seems that they come from differents segments. Increasing the RAMBufferSizeMB to a level that all the docs are in the same segment seems to return only unique numbers.
>
> There is any way to get unique documents in a bulk query without having to cache then in a memory structure?
>
> Thanks in advance!!
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]