Failing to index from Nutch 1.12 to Solr 5.5.3

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

Failing to index from Nutch 1.12 to Solr 5.5.3

Chip Calhoun
I'm switching to more recent Nutch/Solr, after years of using Nutch 1.4 and Solr 3.3.0. I get no results when I index into Solr. I can't tell where this breaks down.

I use these commands:
cd /opt/apache-nutch-1.12/runtime/local
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-
export NUTCH_CONF_DIR=/opt/apache-nutch-1.12/runtime/local/conf/phfaws
bin/crawl urls/phfaws crawl/phfaws 1
bin/nutch solrindex http://localhost:8983/solr/phfaws/ crawl/phfaws/crawldb -linkdb crawl/phfaws/linkdb crawl/phfaws/segments/*

I believe that Nutch is crawling properly, but I do find that the crawl folders end up about 25% as large as what I produced with Nutch 1.4. I suspect that the problem is with the Nutch/Solr integration. My Solr core didn't create a schema.xml, instead having a managed scheme. I've copied my Nutch local conf's schema.xml into Solr, but I haven't seen that I'm supposed to do anything more with that.

Chip Calhoun
Digital Archivist
Niels Bohr Library & Archives
American Institute of Physics
One Physics Ellipse
College Park, MD  20740