[jira] [Updated] (LUCENE-5148) SortedSetDocValues caching / state

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Updated] (LUCENE-5148) SortedSetDocValues caching / state

Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand updated LUCENE-5148:

    Attachment: LUCENE-5148.patch

I tried to add auto-cloning to see its impact:
 - SortedSet instances are cached per-thread and cloned by SegmentCoreReaders when requested,
 - clones are only available for use in the current thread (no cloning of the index inputs).

So nothing changes for users, it just removes the trap mentioned in the summary. However, it requires codec implementers to implement clone() correctly so that two different instances on the same field can be used in parallel in the same thread. A test has been added to BaseDocValuesFormatTestCase to make sure all our impls do that correctly.

Robert, what do you think?

> SortedSetDocValues caching / state
> ----------------------------------
>                 Key: LUCENE-5148
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5148
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-5148.patch
> I just spent some time digging into a bug which was due to the fact that SORTED_SET doc values are stateful (setDocument/nextOrd) and are cached per thread. So if you try to get two instances from the same field in the same thread, you will actually get the same instance and won't be able to iterate over ords of two documents in parallel.
> This is not necessarily a bug, this behavior can be documented, but I think it would be nice if the API could prevent from such mistakes by storing the state in a separate object or cloning the SortedSetDocValues object in SegmentCoreReaders.getSortedSetDocValues?
> What do you think?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]