[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-8496) Explore selective dimension indexing in BKDReader/Writer

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615249#comment-16615249 ]

Nicholas Knize commented on LUCENE-8496:
----------------------------------------

{quote}It is a pity that the patch is so large{quote}

Yeah. Refactoring {{pointDimensionCount}} touched a lot of files so the patch is rather busy. I could change it to leave {{pointDimensionCount}} as is and just add a new {{indexDimensionCount}}?

{quote}Out of curiosity, did your working copy already have LUCENE-7862 when you ran the benchmark?{quote}

Yes. My benchmark numbers include the latest change to store min/max packed values. The only difference is using {{LatLonShape}} without and with the selective indexing approach.

{quote}...could you maybe set up a pull request or use Apache reviewboard{quote}

 Sure thing! I went ahead and opened a PR [here|https://github.com/apache/lucene-solr/pull/451]

> Explore selective dimension indexing in BKDReader/Writer
> --------------------------------------------------------
>
>                 Key: LUCENE-8496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8496
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Nicholas Knize
>            Priority: Major
>         Attachments: LUCENE-8496.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue explores adding a new feature to BKDReader/Writer that enables users to select a fewer number of dimensions to be used for creating the BKD index than the total number of dimensions specified for field encoding. This is useful for encoding dimensional data that is used for interpreting the encoded field data but unnecessary (or not efficient) for creating the index structure. One such example is {{LatLonShape}} encoding. The first 4 dimensions may be used to to efficiently search/index the triangle using its precomputed bounding box as a 4D point, and the remaining dimensions can be used to encode the vertices of the tessellated triangle. This causes BKD to act much like an R-Tree for shape data where search is distilled into a 4D point (instead of a more expensive 6D point) and the triangle is encoded using a portion of the remaining (non-indexed) dimensions. Fields that use the full data range for indexing are not impacted and behave as they normally would.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]