[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614910#comment-16614910 ]

Adrien Grand commented on LUCENE-8496:

It is a pity that the patch is so large given that the change is actually simple. I like the idea and the patch looks very clean overall, I see you added validation for corner-cases like rejecting dataDimensionCount>0 but indexDimensionCount==0. Out of curiosity, did your working copy already have LUCENE-7862 when you ran the benchmark? I have some minor comments on the patch, could you maybe set up a pull request or use Apache reviewboard to make it easier to comment on your changes and iterate?

> Explore selective dimension indexing in BKDReader/Writer
> --------------------------------------------------------
>                 Key: LUCENE-8496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8496
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Nicholas Knize
>            Priority: Major
>         Attachments: LUCENE-8496.patch
> This issue explores adding a new feature to BKDReader/Writer that enables users to select a fewer number of dimensions to be used for creating the BKD index than the total number of dimensions specified for field encoding. This is useful for encoding dimensional data that is used for interpreting the encoded field data but unnecessary (or not efficient) for creating the index structure. One such example is {{LatLonShape}} encoding. The first 4 dimensions may be used to to efficiently search/index the triangle using its precomputed bounding box as a 4D point, and the remaining dimensions can be used to encode the vertices of the tessellated triangle. This causes BKD to act much like an R-Tree for shape data where search is distilled into a 4D point (instead of a more expensive 6D point) and the triangle is encoded using a portion of the remaining (non-indexed) dimensions. Fields that use the full data range for indexing are not impacted and behave as they normally would.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]