IndexMerger and non-nutch Lucene indexes

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

IndexMerger and non-nutch Lucene indexes

Brian Whitman
We use Solr to inject non-nutch crawled Lucene documents into the  
main Nutch index. This works fine.. we can search (not using the  
nutch searcher) both nutch docs and the solr injected docs with one  
query.

However, I would like to use the IndexMerger for merging successive  
Nutch crawls. If one of the index directories we give bin/nutch merge  
has Solr-generated Lucene docs in it, we get:

2007-01-26 02:49:34,093 INFO  indexer.IndexMerger - merging indexes  
to: crawl/index
2007-01-26 02:49:34,094 INFO  indexer.IndexMerger - Adding crawl/
index_07_01_25_20_33_15/part-00000
2007-01-26 02:49:34,102 INFO  indexer.IndexMerger - Adding crawl/
index/_0.fnm
2007-01-26 02:49:34,106 FATAL indexer.IndexMerger - IndexMerger:  
java.io.IOException: crawl/index/_0.fnm not a directory
         at org.apache.nutch.indexer.FsDirectory.<init>
(FsDirectory.java:44)
         at org.apache.nutch.indexer.IndexMerger.merge
(IndexMerger.java:82)
         at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:
150)
         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
         at org.apache.nutch.indexer.IndexMerger.main
(IndexMerger.java:113)


Any way around this?