I think I have Nutch set up right (Nutch 1.13 and Solr 6.6.0). When I try
to crawl stuff and send it to Solr, it doesn't seem to be getting any
content. Here's the code I'm using to get web content and push it to Solr:
Not sure about 1.x, but for 2.x setting up solr 6.6 is the same as 1.x tutorial. Only thing is that the schema.xml in 1.x has not been updated for sole 6.6.0 so there will be errors in some when you first start it up, but you can easily find solutions on stackoverflow...
Also you might need to delete managedschema after updating schema.xml to allow solr to re-read the configs.
Hope this helps!
> On Aug 4, 2017, at 04:44, Ray Crawford <[hidden email]> wrote:
> I think I have Nutch set up right (Nutch 1.13 and Solr 6.6.0). When I try
> to crawl stuff and send it to Solr, it doesn't seem to be getting any
> content. Here's the code I'm using to get web content and push it to Solr:
> mkdir -p /opt/nutch/urls
> echo 'http://www.with-impact.com' > /opt/nutch/urls/seed.txt
> vi /opt/nutch/conf/regex-urlfilter.txt
> # +.
> export JAVA_HOME='/etc/alternatives/jre_1.8.0'
> /opt/solr/bin/solr create -c nutch_solr_data_core
> /opt/nutch/bin/nutch inject crawl/crawldb urls/seed.txt
> cd /opt/nutch
> /opt/nutch/bin/nutch generate crawl/crawldb crawl/segments
> s1=`ls -d /opt/nutch/crawl/segments/2* | tail -1`
> /opt/nutch/bin/nutch fetch $s1
> /opt/nutch/bin/nutch parse $s1
> /opt/nutch/bin/nutch updatedb crawl/crawldb $s1
> /opt/nutch/bin/nutch invertlinks crawl/linkdb -dir crawl/segments
> /opt/nutch/bin/nutch solrindex http://localhost:8983/solr/nutch_solr_data_core > crawl/crawldb/ -linkdb crawl/linkdb/ $s1
> Am I missing a step?
> I wouldn't mind using nutch 2, but I didn't see a good tutorial for Nutch
> 2/Solr 6 integration. Can anyone point me to one?