Parallel merge of indexes

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Parallel merge of indexes

eakarsu
I need some help in merging indexes in parallel much faster way. I am using IndexMergeTool provided by Lucene but it seems very slow. Is there a way to speed up the process ?

What I do is that I make 16 shards with no replication and then add replica for every node and every shard. In the last step, I merge indexes. First 2 steps is finished quickly but last merging step takes time

I appreciate your help

Erol Akarsu

--

Erol Akarsu

Reply | Threaded
Open this post in threaded view
|

Re: Parallel merge of indexes

eakarsu
I can give time information. I am dealing with big product records. I have 5 million products
Indexing without replica with 16 shards : 20 minutes
Add replicas : 5 minutes
Index merging with IndexMergeTool  : 40 minutes

On Tue, Feb 4, 2020 at 6:23 PM Erol Akarsu <[hidden email]> wrote:
I need some help in merging indexes in parallel much faster way. I am using IndexMergeTool provided by Lucene but it seems very slow. Is there a way to speed up the process ?

What I do is that I make 16 shards with no replication and then add replica for every node and every shard. In the last step, I merge indexes. First 2 steps is finished quickly but last merging step takes time

I appreciate your help

Erol Akarsu

--

Erol Akarsu

--

Erol Akarsu

Reply | Threaded
Open this post in threaded view
|

Re: Parallel merge of indexes

Erick Erickson
_Why_ are you trying to merge indexes? On the surface this doesn’t
make much sense.

You start with 16 shards. Your Zookeeper configuration will show that
each shard has 1/16 of the hash range (based on the <uniqueKey>. What
are you merging? Are you merging all the segments on each shard?
Merging the indexes from the separate shards?? If the latter, your
bookkeeping in Zookeeper will be totally messed up.

Or is this really an optimize, i.e. you’re trying to merge all the segments
on each shard down to a single segment so in the end you still have 16
shards, each with a single segment?

This sounds like an XY problem. You’re trying to accomplish some
end goal and asking how to do Y, without explaining the actual
problem you’re trying to solve, the X.

Perhaps if you give us some background we can suggest alternatives.

Best,
Erick

> On Feb 4, 2020, at 6:58 PM, Erol Akarsu <[hidden email]> wrote:
>
> I can give time information. I am dealing with big product records. I have 5 million products
> Indexing without replica with 16 shards : 20 minutes
> Add replicas : 5 minutes
> Index merging with IndexMergeTool  : 40 minutes
>
> On Tue, Feb 4, 2020 at 6:23 PM Erol Akarsu <[hidden email]> wrote:
> I need some help in merging indexes in parallel much faster way. I am using IndexMergeTool provided by Lucene but it seems very slow. Is there a way to speed up the process ?
>
> What I do is that I make 16 shards with no replication and then add replica for every node and every shard. In the last step, I merge indexes. First 2 steps is finished quickly but last merging step takes time
>
> I appreciate your help
>
> Erol Akarsu
>
> --
>
> Erol Akarsu
>
> --
>
> Erol Akarsu
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Parallel merge of indexes

Robert Muir
On Tue, Feb 4, 2020 at 7:37 PM Erick Erickson <[hidden email]> wrote:
Or is this really an optimize, i.e. you’re trying to merge all the segments
on each shard down to a single segment so in the end you still have 16
shards, each with a single segment?

The IndexMergeTool that Erol is using does a forceMerge(1) at the end: https://s.apache.org/n1l24

Erol, does it take forever after the MergeTool prints "Full Merge..." ?

It would be nice if this tool had better options and defaults.