Optimizing documentdictionary build on solr cloud suggester

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Optimizing documentdictionary build on solr cloud suggester

diwakar bhardwaj
I have around 300,000 records to be uploaded on a solr cloud suggester.
These records are dynamic i.e. new documents will be added and some
document will be deleted in future on a regular basis. The problem I am
facing is either:

1. Use FileDictionaryFactory: this method is an operational nightmare. I
would need to keep generating the file and upload it to zookeeper (still
haven't figured out how to upload huge file like this to zookeeper). And
might need to create index on each server on the solr cloud separately.
Doing this frequently does not seems possible.

2. Use DocumentDictionaryFactory: this method seems like an obvious choice,
but building index here is a nightmare as well. Everytime I try to build
index, I get the "No space left on the device" error. I tried building it
on 5K records and it was successful. But it took 40 minutes and consumed
all 10GB of memory during this entire 40 minutes.

My question is, can we optimize this index building time if we follow the
second approach.
Or if I follow the first approach what should be the ideal way of dealing
with frequent changes to be indexed on solr cloud.

Thanks and regards,
Diwakar Bhardwaj