continuously creating index packages for katta with solr
I'd like to use SOLR to create indices for deployment with katta. I'd like to
install a SOLR server on each crawler. The crawling script then sends the
content directly to the local SOLR server. Every 5-10 minutes I'd like to take
the current SOLR index, add it to katta and let SOLR start with an empty index
Does anybody has an idea, how this could be achieved?
I'm currently evaluating the following solution: My crawler sends all docs to
a SOLR core named "WHATEVER". Every 5 minutes a new SOLR core with the same
name WHATEVER is created, but with a new datadir. The datadir contains a
timestamp in it's name.
Now I can check for datadirs that are older then the newest one and all these
can be picked up for submission to katta.
Now there remain two questions:
- When the old core is closed, will there be an implicit commit?
- How to be sure, that no more work is in progress on an old core datadir?