Improving DocValue sorting performance

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Improving DocValue sorting performance

Daniel Penning-2
Hi

I think sorting and searching on docvalues can benefit from similar optimizations like impacts do when scoring is used. By storing the minimum and maximum value for blocks of docvalues it would be possible to skip over sets of documents that don't contain any values in the relevant range.

When sorting a large resultset this could be used to skip over parts of the index once the desired number of documents is found and be narrowed down further step by step. This could also be used to improve performance of docvalue based range queries used in conjunctions by only looking at blocks of documents that actually contain values in the correct range.

Currently this is just an idea i had when i looked at the impact implementation and i wanted get your opinion on this before i spend time building a proof of concept implementation.

--

Daniel Penning / Senior Product Developer
T.: +49 (0)30 5900113-83 / F.: +49 (0)30 5900113-99
E-Mail: [hidden email]
Web: www.stroeermediabrands.de

Ströer Media Brands AG, Torstraße 49, 10119 Berlin-Mitte
Vorstand: Marc Schmitz 
Handelsregister: Amtsgericht Berlin-Charlottenburg HRB 126603 B

Reply | Threaded
Open this post in threaded view
|

Re: Improving DocValue sorting performance

Alan Woodward-3
+1 This would be a very nice addition, and Toke’s recent work adding jump tables to docvalues would provide a natural place to store the information.

On 11 Feb 2019, at 12:42, Daniel Penning <[hidden email]> wrote:

Hi

I think sorting and searching on docvalues can benefit from similar optimizations like impacts do when scoring is used. By storing the minimum and maximum value for blocks of docvalues it would be possible to skip over sets of documents that don't contain any values in the relevant range.

When sorting a large resultset this could be used to skip over parts of the index once the desired number of documents is found and be narrowed down further step by step. This could also be used to improve performance of docvalue based range queries used in conjunctions by only looking at blocks of documents that actually contain values in the correct range.

Currently this is just an idea i had when i looked at the impact implementation and i wanted get your opinion on this before i spend time building a proof of concept implementation.

--

Daniel Penning / Senior Product Developer
T.: +49 (0)30 5900113-83 / F.: +49 (0)30 5900113-99
E-Mail: [hidden email]
Web: www.stroeermediabrands.de

Ströer Media Brands AG, Torstraße 49, 10119 Berlin-Mitte
Vorstand: Marc Schmitz 
Handelsregister: Amtsgericht Berlin-Charlottenburg HRB 126603 B