nutch 0.7.2 webapp on resin3

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

nutch 0.7.2 webapp on resin3

Paul Dhaliwal
I tried out nutch webapp on resin3, it had issues.

First issue was that I would get nothing but  500 Servlet Error, and the
world null when I tried to search.

I didn't follow my own suggestions that I had posted here :
http://wiki.apache.org/nutch/GettingNutchRunningWithResin

After I changed the the system properties, it was fine.

It took me a while to realize that it was the xml parser that was causing
the issues.. I tried debugging the OnlineClustererFactory's getOnlineCluster
but the problem is, it never gets there.. Issues comes up when
OnlineClustereFactory's X_POINT static member is being loaded. This meant
that search.jsp's servlet never loaded and it was always trying to compile.
Since I didn't have a java 1.4 logging configured properly, I didn't see
many error message either.

However, I was to get it to run by adding just following two lines in the
resin conf (they ask resin to use xercres vs resin's own xml parser)

    <system-property javax.xml.parsers.DocumentBuilderFactory="
org.apache.xerces.jaxp.DocumentBuilderFactoryImpl"/>
    <system-property javax.xml.parsers.SAXParserFactory="
org.apache.xerces.jaxp.SAXParserFactoryImpl"/>
and this line gave me a little more information abotu what was going on:
    <system-property
java.util.logging.config.files='/home/paul/java1.4logging.conf'/>

Another issue I ran into was that most of the language directories are
missing footer.html, and in search.jsp expects the footer.jsp to be in
languages directory..

towards the end you see:
<jsp:include page="<%= language + "/include/footer.html"%>"/>

I had to change it to
<jsp:include page="/include/footer.html"/>

which seems to be the right thing  to do regardless as footer.html only
exists in that directory and does not seem to have language specific "stuff"
in it.
Reply | Threaded
Open this post in threaded view
|

FileNotFoundException on crawl

Michael Levy-3
I'm running Nutch 0.7.2 under Solaris 9, java 1.5.0_06.  I followed the
Nutch version 0.8 tutorial and am getting a FileNotFoundException as
below.  Any ideas?  Thanks.

# bin/nutch crawl urls -dir crawl -depth 3 -topN 50
060413 150039 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-default.xml
060413 150040 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/crawl-tool.xml
060413 150041 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-site.xml
060413 150041 No FS indicated, using default:local
060413 150041 crawl started in: crawl-20060413150041
060413 150041 rootUrlFile = urls -dir crawl -depth 3 -topN 50
060413 150041 threads = 10
060413 150041 depth = 5
060413 150043 Created webdb at
LocalFS,/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/crawl-20060413150041/db
Exception in thread "main" java.io.FileNotFoundException: urls -dir
crawl -depth 3 -topN 50 (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileReader.<init>(FileReader.java:55)
        at
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:372)
        at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
# Exception in thread "main" java.io.FileNotFoundException