Fwd: no results from search, nutch 0.8

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Fwd: no results from search, nutch 0.8

Chris Newton
  Hi guys... I've got a problem with the version of nutch I pulled from SVN
2 days ago.  I follow the tutorial instructions (along with some additional
searching when I hit roadblocks), and I get to what seems to be a working
setup.  However, when I search, I always get '0-0 results of 0 matching'.

  I have:

  ~/seeds/urls populated with a couple base urls
  root@demon:~/seeds# more urls
  http://www.apache.org


  ~/crawl has crawldb and linkdb in it...

  I did both of the following:
  - edited nutch-default.xml searcher.dir and pointed to /root/crawl
  AND
  - edited /usr/share/tomcat5/webapps/ROOT/WEB-INF/classes/nutch-default.xml
searcher.dir to /root/crawl and restarted tomcat.

  - edited crawl-urlfilter.txt to make sure what I'm crawling doesn't get
tossed out... last line is : +.

  - With everything started... when I try to search, I get:

 <http://localhost:8180/en/>   About <http://localhost:8180/en/about.html>
FAQ <http://wiki.apache.org/nutch/FAQ>
help<http://localhost:8180/en/help.html> Hits
*0-0* (out of about 0 total matching pages):

 *RSS*
<http://localhost:8180/opensearch?query=apache&hitsPerSite=2&lang=en&hitsPerPage=10>

  <http://wiki.apache.org/nutch/FAQ>
ca <http://localhost:8180/ca/> | de <http://localhost:8180/de/> |
en<http://localhost:8180/en/>|
es <http://localhost:8180/es/> | fi <http://localhost:8180/fi/> |
fr<http://localhost:8180/fr/>|
hu <http://localhost:8180/hu/> | it <http://localhost:8180/it/> |
jp<http://localhost:8180/jp/>|
ms <http://localhost:8180/ms/> | nl <http://localhost:8180/nl/> |
pl<http://localhost:8180/pl/>|
pt <http://localhost:8180/pt/> | sh <http://localhost:8180/sh/> |
sr<http://localhost:8180/sr/>|
sv <http://localhost:8180/sv/> | th <http://localhost:8180/th/> |
zh<http://localhost:8180/zh/>



Thanks for any help you guys can give....

Chris

Logs follow:

  /root/nutch/trunk/logs/nutch.log

2006-06-16 13:15:38,422 INFO  crawl.Injector - Injector: starting
2006-06-16 13:15:38,423 INFO  crawl.Injector - Injector: crawlDb:
crawl/crawldb
2006-06-16 13:15:38,423 INFO  crawl.Injector - Injector: urlDir: seeds
2006-06-16 13:15:38,985 INFO  crawl.Injector - Injector: Converting injected
urls to crawl db entries.
2006-06-16 13:15:39,764 INFO  net.UrlNormalizerFactory - Using URL
normalizer: org.apache.nutch.net.BasicUrlNormalizer
2006-06-16 13:15:39,817 INFO  plugin.PluginRepository - Plugins: looking in:
/root/nutch/trunk/build/plugins
2006-06-16 13:15:40,153 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2006-06-16 13:15:40,153 INFO  plugin.PluginRepository - Registered Plugins:

.... snip plugins

2006-06-16 13:15:46,320 INFO  crawl.Generator - Generator: Partitioning
selected urls by host, for politeness.
2006-06-16 13:15:48,145 INFO  crawl.Generator - Generator: done.
2006-06-16 13:15:50,422 INFO  fetcher.Fetcher - Fetcher: starting
2006-06-16 13:15:50,423 INFO  fetcher.Fetcher - Fetcher: segment:
crawl/segments/20060616131544
2006-06-16 13:15:51,305 INFO  fetcher.Fetcher - Fetcher: threads: 10
2006-06-16 13:15:51,372 INFO  plugin.PluginRepository - Plugins: looking in:
/root/nutch/trunk/build/plugins
2006-06-16 13:15:51,736 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2006-06-16 13:15:51,737 INFO  plugin.PluginRepository - Registered Plugins:

... snip plugins

2006-06-16 13:15:51,794 INFO  net.UrlNormalizerFactory - Using URL
normalizer: org.apache.nutch.net.BasicUrlNormalizer
2006-06-16 13:15:51,821 INFO  fetcher.Fetcher - fetching
http://www.apache.org/
2006-06-16 13:15:51,853 INFO  http.Http - http.proxy.host = null
2006-06-16 13:15:51,853 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:15:51,853 INFO  http.Http - http.timeout = 10000
2006-06-16 13:15:51,853 INFO  http.Http - http.content.limit = 65536
2006-06-16 13:15:51,853 INFO  http.Http - http.agent = NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html;
[hidden email])
2006-06-16 13:15:51,853 INFO  http.Http - fetcher.server.delay = 1000
2006-06-16 13:15:51,853 INFO  http.Http - http.max.delays = 100
2006-06-16 13:15:51,853 INFO  http.Http - http.proxy.host = null
2006-06-16 13:15:51,853 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:15:51,853 INFO  http.Http - http.timeout = 10000
2006-06-16 13:15:51,853 INFO  http.Http - http.content.limit = 65536
2006-06-16 13:15:51,854 INFO  http.Http - http.agent = NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html; [hidden email]
)
2006-06-16 13:15:51,854 INFO  http.Http - fetcher.server.delay = 1000
2006-06-16 13:15:51,854 INFO  http.Http - http.max.delays = 100
2006-06-16 13:15:53,607 INFO  crawl.SignatureFactory - Using Signature impl:
org.apache.nutch.crawl.MD5Signature
2006-06-16 13:15:55,278 INFO  fetcher.Fetcher - Fetcher: done
2006-06-16 13:15:55,837 INFO  crawl.CrawlDb - CrawlDb update: starting
2006-06-16 13:15:55,838 INFO  crawl.CrawlDb - CrawlDb update: db:
crawl/crawldb
2006-06-16 13:15:55,838 INFO  crawl.CrawlDb - CrawlDb update: segment:
crawl/segments/20060616131544
2006-06-16 13:15:56,881 INFO  crawl.CrawlDb - CrawlDb update: Merging
segment data into db.
2006-06-16 13:15:57,760 INFO  plugin.PluginRepository - Plugins: looking in:
/root/nutch/trunk/build/plugins
2006-06-16 13:15:58,288 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2006-06-16 13:15:58,289 INFO  plugin.PluginRepository - Registered Plugins:

... snip plugins

2006-06-16 13:15:58,608 INFO  crawl.CrawlDb - CrawlDb update: done
2006-06-16 13:16:00,047 INFO  crawl.Generator - topN: 100
2006-06-16 13:16:01,818 INFO  crawl.Generator - Generator: starting
2006-06-16 13:16:01,819 INFO  crawl.Generator - Generator: segment:
crawl/segments/20060616131601
2006-06-16 13:16:01,819 INFO  crawl.Generator - Generator: Selecting
most-linked urls due for fetch.
2006-06-16 13:16:02,578 INFO  plugin.PluginRepository - Plugins: looking in:
/root/nutch/trunk/build/plugins
2006-06-16 13:16:03,042 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2006-06-16 13:16:03,043 INFO  plugin.PluginRepository - Registered Plugins:
2006-06-16 13:16:03,043 INFO  plugin.PluginRepository -         CyberNeko
HTML Parser (lib-nekohtml)

... snip plugins

2006-06-16 13:16:03,524 INFO  crawl.Generator - Generator: Partitioning
selected urls by host, for politeness.
2006-06-16 13:16:05,067 INFO  crawl.Generator - Generator: done.
2006-06-16 13:16:06,759 INFO  fetcher.Fetcher - Fetcher: starting
2006-06-16 13:16:06,760 INFO  fetcher.Fetcher - Fetcher: segment:
crawl/segments/20060616131601
2006-06-16 13:16:07,495 INFO  fetcher.Fetcher - Fetcher: threads: 10
2006-06-16 13:16:07,515 INFO  plugin.PluginRepository - Plugins: looking in:
/root/nutch/trunk/build/plugins
2006-06-16 13:16:07,944 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2006-06-16 13:16:07,944 INFO  plugin.PluginRepository - Registered Plugins:

... snip plugins

2006-06-16 13:16:08,049 INFO  fetcher.Fetcher - fetching
http://www.apache.org/dev/version-control.html

2006-06-16 13:16:08,058 INFO  fetcher.Fetcher - fetching
http://jakarta.apache.org/
2006-06-16 13:16:08,058 INFO  fetcher.Fetcher - fetching
http://logging.apache.org/
2006-06-16 13:16:08,058 INFO  fetcher.Fetcher - fetching
http://ws.apache.org/
2006-06-16 13:16:08,059 INFO  fetcher.Fetcher - fetching
http://www.apache.org/foundation/events.html
2006-06-16 13:16:08,059 INFO  fetcher.Fetcher - fetching
http://www.apache-ssl.org/
2006-06-16 13:16:08,059 INFO  fetcher.Fetcher - fetching
http://www.indians.org/welker/apache.htm
2006-06-16 13:16:08,060 INFO  fetcher.Fetcher - fetching http://ant.apache.org/

2006-06-16 13:16:08,061 INFO  fetcher.Fetcher - fetching
http://james.apache.org/
2006-06-16 13:16:08,061 INFO  fetcher.Fetcher - fetching
http://cocoon.apache.org/
2006-06-16 13:16:08,117 INFO  http.Http - http.proxy.host = null
2006-06-16 13:16:08,117 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:16:08,117 INFO  http.Http - http.timeout = 10000
2006-06-16 13:16:08,117 INFO  http.Http - http.content.limit = 65536
2006-06-16 13:16:08,117 INFO  http.Http - http.agent = NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html ;
[hidden email])
2006-06-16 13:16:08,117 INFO  http.Http - fetcher.server.delay = 1000
2006-06-16 13:16:08,117 INFO  http.Http - http.max.delays = 100
2006-06-16 13:16:08,117 INFO  http.Http - http.proxy.host = null
2006-06-16 13:16:08,117 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:16:08,117 INFO  http.Http - http.timeout = 10000
2006-06-16 13:16:08,117 INFO  http.Http - http.content.limit = 65536
2006-06-16 13:16:08,117 INFO  http.Http - http.agent = NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html;
[hidden email])
2006-06-16 13:16:08,117 INFO  http.Http - fetcher.server.delay = 1000
2006-06-16 13:16:08,117 INFO  http.Http - http.max.delays = 100
2006-06-16 13:16:08,146 INFO  http.Http - http.proxy.host = null
2006-06-16 13:16:08,146 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:16:08,146 INFO  http.Http - http.timeout = 10000
2006-06-16 13:16:08,146 INFO  http.Http - http.content.limit = 65536
2006-06-16 13:16:08,146 INFO  http.Http - http.agent = NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html; [hidden email]
)
2006-06-16 13:16:08,146 INFO  http.Http - fetcher.server.delay = 1000
2006-06-16 13:16:08,147 INFO  http.Http - http.max.delays = 100
2006-06-16 13:16:08,147 INFO  http.Http - http.proxy.host = null
2006-06-16 13:16:08,147 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:16:08,147 INFO  http.Http - http.timeout = 10000
2006-06-16 13:16:08,147 INFO  http.Http - http.content.limit = 65536
2006-06-16 13:16:08,147 INFO  http.Http - http.agent = NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html;
[hidden email])
2006-06-16 13:16:08,147 INFO  http.Http - fetcher.server.delay = 1000
2006-06-16 13:16:08,147 INFO  http.Http - http.max.delays = 100
2006-06-16 13:16:08,147 INFO  http.Http - http.proxy.host = null
2006-06-16 13:16:08,147 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:16:08,147 INFO  http.Http - http.timeout = 10000
2006-06-16 13:16:08,147 INFO  http.Http - http.content.limit = 65536


.................................... lots of that sorta stuff, deleted


2006-06-16 13:16:08,059 INFO  fetcher.Fetcher - fetching
http://www.apache.org/foundation/events.html
2006-06-16 13:16:08,059 INFO  fetcher.Fetcher - fetching
http://www.apache-ssl.org/
2006-06-16 13:16:08,059 INFO  fetcher.Fetcher - fetching
http://www.indians.org/welker/apache.htm

2006-06-16 13:16:08,060 INFO  fetcher.Fetcher - fetching
http://ant.apache.org/
2006-06-16 13:16:08,061 INFO  fetcher.Fetcher - fetching
http://james.apache.org/

2006-06-16 13:16:08,061 INFO  fetcher.Fetcher - fetching
http://cocoon.apache.org/
2006-06-16 13:16:08,117 INFO  http.Http - http.proxy.host = null
2006-06-16 13:16:08,117 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:16:08,117 INFO  http.Http - http.timeout = 10000
2006-06-16 13:16:08,117 INFO  http.Http - http.content.limit = 65536
2006-06-16 13:16:08,117 INFO  http.Http - http.agent = NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html;
[hidden email])
2006-06-16 13:16:08,117 INFO  http.Http - fetcher.server.delay = 1000
2006-06-16 13:16:08,117 INFO  http.Http - http.max.delays = 100
2006-06-16 13:16:08,117 INFO  http.Http - http.proxy.host = null
2006-06-16 13:16:08,117 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:16:08,117 INFO  http.Http - http.timeout = 10000
2006-06-16 13:16:08,117 INFO  http.Http - http.content.limit = 65536
2006-06-16 13:16:08,117 INFO  http.Http - http.agent = NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html;
[hidden email])
2006-06-16 13:16:08,117 INFO  http.Http - fetcher.server.delay = 1000
2006-06-16 13:16:08,117 INFO  http.Http - http.max.delays = 100
2006-06-16 13:16:08,146 INFO  http.Http - http.proxy.host = null
2006-06-16 13:16:08,146 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:16:08,146 INFO  http.Http - http.timeout = 10000
2006-06-16 13:16:08,146 INFO  http.Http - http.content.limit = 65536
2006-06-16 13:16:08,146 INFO  http.Http - http.agent = NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html;
[hidden email])

... snippy snippy

2006-06-16 13:17:43,267 INFO  crawl.Generator - Generator: Partitioning
selected urls by host, for politeness.
2006-06-16 13:17:44,794 INFO  crawl.Generator - Generator: done.
2006-06-16 13:17:45,930 INFO  fetcher.Fetcher - Fetcher: starting
2006-06-16 13:17:45,931 INFO  fetcher.Fetcher - Fetcher: segment:
crawl/segments/20060616131739
2006-06-16 13:17:47,283 INFO  fetcher.Fetcher - Fetcher: threads: 10
2006-06-16 13:17:47,301 INFO  plugin.PluginRepository - Plugins: looking in:
/root/nutch/trunk/build/plugins
2006-06-16 13:17:47,592 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2006-06-16 13:17:47,593 INFO  plugin.PluginRepository - Registered Plugins:

 yowzers on the snip snip

2006-06-16 13:17:47,728 INFO  fetcher.Fetcher - fetching
http://www.logilune.com/
2006-06-16 13:17:47,730 INFO  fetcher.Fetcher - fetching
http://www.google.com/
2006-06-16 13:17:47,731 INFO  fetcher.Fetcher - fetching
http://www.ics.uci.edu/
2006-06-16 13:17:47,731 INFO  fetcher.Fetcher - fetching
http://www.informatica.com/
2006-06-16 13:17:47,733 INFO  fetcher.Fetcher - fetching
http://www.indexgeo.com.au/

2006-06-16 13:17:47,733 INFO  fetcher.Fetcher - fetching
http://www.coopermcgregor.com/
2006-06-16 13:17:47,733 INFO  fetcher.Fetcher - fetching
http://www.unixguide.net/freebsd/faq/16.19.shtml
2006-06-16 13:17:47,733 INFO  fetcher.Fetcher - fetching
http://avalon.apache.org/
2006-06-16 13:17:47,734 INFO  fetcher.Fetcher - fetching
http://www.lotus.com/
2006-06-16 13:17:47,734 INFO  fetcher.Fetcher - fetching
http://www.indexgeo.com.au/apache/
2006-06-16 13:17:47,764 INFO  http.Http - http.proxy.host = null
2006-06-16 13:17:47,764 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:17:47,764 INFO  http.Http - http.timeout = 10000
2006-06-16 13:17:47,765 INFO  http.Http - http.content.limit = 65536
2006-06-16 13:17:47,765 INFO  http.Http - http.agent = NutchCVS/0.8-dev
(Nutch; http://lucene.apache.org/nutch/bot.html;
[hidden email])
2006-06-16 13:17:47,765 INFO  http.Http - fetcher.server.delay = 1000
2006-06-16 13:17:47,765 INFO  http.Http - http.max.delays = 100
2006-06-16 13:17:47,765 INFO  http.Http - http.proxy.host = null
2006-06-16 13:17:47,765 INFO  http.Http - http.proxy.port = 8080
2006-06-16 13:17:47,765 INFO  http.Http - http.timeout = 10000
2006-06-16 13:17:47,765 INFO  http.Http - http.content.limit = 65536
2006-06-16 13:17:47,765 INFO  http.Http - http.agent = NutchCVS/0.8-dev
(Nutch; <http://lucene.apache.org/nutch/bot.html>


................... more deleted


2006-06-16 13:19:32,827 INFO  fetcher.Fetcher - Fetcher: done
2006-06-16 13:19:33,272 INFO  crawl.CrawlDb - CrawlDb update: starting
2006-06-16 13:19:33,272 INFO  crawl.CrawlDb - CrawlDb update: db:
crawl/crawldb
2006-06-16 13:19:33,273 INFO  crawl.CrawlDb - CrawlDb update: segment:
crawl/segments/20060616131739
2006-06-16 13:19:33,850 INFO  crawl.CrawlDb - CrawlDb update: Merging
segment data into db.
2006-06-16 13:19:34,677 INFO  plugin.PluginRepository - Plugins: looking in:
/root/nutch/trunk/build/plugins
2006-06-16 13:19:34,975 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]

... snip-a-nator

(org.apache.nutch.searcher.QueryFilter)
2006-06-16 13:19:35,461 INFO  crawl.CrawlDb - CrawlDb update: done
2006-06-16 13:19:37,424 INFO  crawl.LinkDb - LinkDb: starting
2006-06-16 13:19:37,425 INFO  crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
2006-06-16 13:19:37,539 INFO  crawl.LinkDb - LinkDb: adding segment:
crawl/segments/20060616131544
2006-06-16 13:19:37,540 INFO  crawl.LinkDb - LinkDb: adding segment:
crawl/segments/20060616131601
2006-06-16 13:19:37,540 INFO  crawl.LinkDb - LinkDb: adding segment:
crawl/segments/20060616131739
2006-06-16 13:19:41,545 INFO  crawl.LinkDb - LinkDb: done
2006-06-16 13:19:42,058 INFO  indexer.Indexer - Indexer: starting
2006-06-16 13:19:42,059 INFO  indexer.Indexer - Indexer: linkdb:
crawl/segments/20060616131601
2006-06-16 13:19:42,577 INFO  indexer.Indexer - Indexer: adding segment:
crawl/segments/20060616131739
2006-06-16 13:19:44,662 INFO  plugin.PluginRepository - Plugins: looking in:
/root/nutch/trunk/build/plugins
2006-06-16 13:19:45,035 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
.... getting quite snippy....

2006-06-16 13:19:45,045 INFO  indexer.IndexingFilters - Adding
org.apache.nutch.indexer.basic.BasicIndexingFilter
2006-06-16 13:19:45,759 INFO  indexer.Indexer - Optimizing index.
2006-06-16 13:19:46,288 INFO  indexer.Indexer - Indexer: done















  - Tomcat startup looks like:

INFO: Server startup in 6025 ms
16-Jun-2006 1:19:47 PM org.apache.coyote.http11.Http11Protocol pause
INFO: Pausing Coyote HTTP/1.1 on http-8180
16-Jun-2006 1:19:48 PM org.apache.catalina.core.StandardService stop
INFO: Stopping service Tomcat-Standalone
16-Jun-2006 1:19:48 PM org.apache.catalina.core.StandardHostDeployer remove
INFO: Removing web application at context path /admin
16-Jun-2006 1:19:48 PM org.apache.catalina.logger.LoggerBase stop
INFO: unregistering logger Catalina:type=Logger,path=/admin,host=localhost
16-Jun-2006 1:19:48 PM org.apache.catalina.core.StandardHostDeployer remove
INFO: Removing web application at context path /webdav
16-Jun-2006 1:19:48 PM org.apache.catalina.core.StandardHostDeployer remove
INFO: Removing web application at context path /servlets-examples
16-Jun-2006 1:19:48 PM org.apache.catalina.core.StandardHostDeployer remove
INFO: Removing web application at context path /jsp-examples
16-Jun-2006 1:19:48 PM org.apache.catalina.core.StandardHostDeployer remove
INFO: Removing web application at context path /balancer
16-Jun-2006 1:19:48 PM org.apache.catalina.core.StandardHostDeployer remove
INFO: Removing web application at context path /tomcat-docs
16-Jun-2006 1:19:48 PM org.apache.catalina.core.StandardHostDeployer remove
INFO: Removing web application at context path /nutch
16-Jun-2006 1:19:48 PM org.apache.catalina.core.StandardHostDeployer remove
INFO: Removing web application at context path
16-Jun-2006 1:19:48 PM org.apache.catalina.core.StandardHostDeployer remove
INFO: Removing web application at context path /manager
16-Jun-2006 1:19:48 PM org.apache.catalina.logger.LoggerBase stop
INFO: unregistering logger Catalina:type=Logger,host=localhost
16-Jun-2006 1:19:48 PM org.apache.catalina.logger.LoggerBase stop
INFO: unregistering logger Catalina:type=Logger
16-Jun-2006 1:19:48 PM org.apache.coyote.http11.Http11Protocol destroy
INFO: Stopping Coyote HTTP/1.1 on http-8180
16-Jun-2006 1:19:49 PM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8180
16-Jun-2006 1:19:49 PM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 1366 ms
16-Jun-2006 1:19:49 PM org.apache.catalina.core.StandardService start
INFO: Starting service Tomcat-Standalone
16-Jun-2006 1:19:49 PM org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/5.0
16-Jun-2006 1:19:49 PM org.apache.catalina.core.StandardHost start
INFO: XML validation disabled
16-Jun-2006 1:19:49 PM org.apache.catalina.core.StandardHost getDeployer
INFO: Create Host deployer for direct deployment ( non-jmx )
16-Jun-2006 1:19:49 PM org.apache.catalina.core.StandardHostDeployer install
INFO: Processing Context configuration file URL
file:/var/lib/tomcat5/conf/Catalina/localhost/balancer.xml
16-Jun-2006 1:19:50 PM org.apache.catalina.core.StandardHostDeployer install
INFO: Processing Context configuration file URL
file:/var/lib/tomcat5/conf/Catalina/localhost/admin.xml
16-Jun-2006 1:19:52 PM org.apache.catalina.core.StandardHostDeployer install
INFO: Processing Context configuration file URL
file:/var/lib/tomcat5/conf/Catalina/localhost/manager.xml
16-Jun-2006 1:19:52 PM org.apache.catalina.core.StandardHostDeployer install
INFO: Processing Context configuration file URL
file:/var/lib/tomcat5/conf/Catalina/localhost/tomcat-docs.xml
16-Jun-2006 1:19:52 PM org.apache.catalina.core.StandardHostDeployer install
INFO: Installing web application at context path  from URL
file:/usr/share/tomcat5/webapps/ROOT
16-Jun-2006 1:19:52 PM org.apache.catalina.core.StandardHostDeployer install
INFO: Installing web application at context path /nutch from URL
file:/usr/share/tomcat5/webapps/nutch
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: / (Is a directory)
        at java.io.FileOutputStream.openAppend(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:177)
        at java.io.FileOutputStream .<init>(FileOutputStream.java:102)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
        at org.apache.log4j.FileAppender.activateOptions(FileAppender.java
:163)
        at org.apache.log4j.DailyRollingFileAppender.activateOptions (
DailyRollingFileAppender.java:215)
        at org.apache.log4j.config.PropertySetter.activate(
PropertySetter.java:256)
        at org.apache.log4j.config.PropertySetter.setProperties(
PropertySetter.java:132)
        at org.apache.log4j.config.PropertySetter.setProperties(
PropertySetter.java:96)
        at org.apache.log4j.PropertyConfigurator.parseAppender(
PropertyConfigurator.java:654)
        at org.apache.log4j.PropertyConfigurator.parseCategory (
PropertyConfigurator.java:612)
        at org.apache.log4j.PropertyConfigurator.configureRootCategory(
PropertyConfigurator.java:509)
        at org.apache.log4j.PropertyConfigurator.doConfigure(
PropertyConfigurator.java :415)
        at org.apache.log4j.PropertyConfigurator.doConfigure(
PropertyConfigurator.java:441)
        at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(
OptionConverter.java:468)
        at org.apache.log4j.LogManager .<clinit>(LogManager.java:122)
        at org.apache.log4j.Logger.getLogger(Logger.java:104)
        at org.apache.commons.logging.impl.Log4JLogger.getLogger(
Log4JLogger.java:229)
        at org.apache.commons.logging.impl.Log4JLogger .<init>(
Log4JLogger.java:65)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(
NativeConstructorAccessorImpl.java :39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
        at org.apache.commons.logging.impl.LogFactoryImpl.newInstance (
LogFactoryImpl.java:529)
        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(
LogFactoryImpl.java:235)
        at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(
LogFactoryImpl.java:209)
        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:351)
        at org.apache.commons.beanutils.ConvertUtilsBean.<init>(
ConvertUtilsBean.java:130)
        at org.apache.commons.beanutils.BeanUtilsBean .<init>(
BeanUtilsBean.java:110)
        at org.apache.commons.beanutils.BeanUtilsBean$1.initialValue(
BeanUtilsBean.java:68)
        at org.apache.commons.beanutils.ContextClassLoaderLocal.get(
ContextClassLoaderLocal.java :80)
        at org.apache.commons.beanutils.BeanUtilsBean.getInstance(
BeanUtilsBean.java:78)
        at org.apache.commons.beanutils.ConvertUtilsBean.getInstance(
ConvertUtilsBean.java:115)
        at org.apache.commons.beanutils.ConvertUtils.convert (
ConvertUtils.java:217)
        at org.apache.commons.digester.CallMethodRule.end(
CallMethodRule.java:561)
        at org.apache.commons.digester.Rule.end(Rule.java:230)
        at org.apache.commons.digester.Digester.endElement (Digester.java
:1163)
        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
Source)
       at org.apache.catalina.core.StandardHost.addChild(StandardHost.java
:595)
        at org.apache.catalina.core.StandardHostDeployer.install (
StandardHostDeployer.java:277)
        at org.apache.catalina.core.StandardHost.install(StandardHost.java
:832)
        at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java
:625)
        at org.apache.catalina.startup.HostConfig.deployApps (
HostConfig.java:431)
        at org.apache.catalina.startup.HostConfig.start(HostConfig.java:983)
        at org.apache.catalina.startup.HostConfig.lifecycleEvent(
HostConfig.java:349)
        at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent (
LifecycleSupport.java:119)
        at org.apache.catalina.core.ContainerBase.start(ContainerBase.java
:1091)
        at org.apache.catalina.core.StandardHost.start(StandardHost.java
:789)
        at org.apache.catalina.core.ContainerBase.start (ContainerBase.java
:1083)
        at org.apache.catalina.core.StandardEngine.start(StandardEngine.java
:478)
        at org.apache.catalina.core.StandardService.start(
StandardService.java:480)
        at org.apache.catalina.core.StandardServer.start (
StandardServer.java:2313)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:556)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke (
NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.catalina.startup.Bootstrap.start (Bootstrap.java:287)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:425)
log4j:ERROR Either File or DatePattern options are not set for appender
[DRFA].
16-Jun-2006 1:19:53 PM org.apache.catalina.core.StandardHostDeployer install
INFO: Installing web application at context path /servlets-examples from URL
file:/var/lib/tomcat5/webapps/servlets-examples
16-Jun-2006 1:19:53 PM org.apache.catalina.core.StandardHostDeployer install
INFO: Installing web application at context path /jsp-examples from URL
file:/var/lib/tomcat5/webapps/jsp-examples
16-Jun-2006 1:19:53 PM org.apache.catalina.core.StandardHostDeployer install
INFO: Installing web application at context path /webdav from URL
file:/var/lib/tomcat5/webapps/webdav
16-Jun-2006 1:19:53 PM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8180
16-Jun-2006 1:19:53 PM org.apache.jk.common.ChannelSocket init
INFO: JK2: ajp13 listening on /0.0.0.0:8009
16-Jun-2006 1:19:53 PM org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=1/24
config=/usr/share/tomcat5/conf/jk2.properties
16-Jun-2006 1:19:53 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 4395 ms
2006-06-16 13:21:27,508 INFO  Configuration - parsing
file:/usr/share/tomcat5/work/Catalina/localhost/_/loader/hadoop- default.xml
2006-06-16 13:21:27,527 INFO  Configuration - parsing
file:/var/lib/tomcat5/webapps/ROOT/WEB-INF/classes/nutch-default.xml
2006-06-16 13:21:27,540 INFO  Configuration - parsing
file:/var/lib/tomcat5/webapps/ROOT/WEB-INF/classes/nutch- site.xml
2006-06-16 13:21:27,557 INFO  Configuration - parsing
file:/var/lib/tomcat5/webapps/ROOT/WEB-INF/classes/hadoop-site.xml
2006-06-16 13:21:27,587 INFO  PluginRepository - Plugins: looking in:
/var/lib/tomcat5/webapps/ROOT/WEB-INF/classes/plugins
2006-06-16 13:21:27,839 INFO  PluginRepository - Plugin Auto-activation
mode: [true]
2006-06-16 13:21:27,839 INFO  PluginRepository - Registered Plugins:
2006-06-16 13:21:27,839 INFO  PluginRepository -        CyberNeko HTML
Parser (lib-nekohtml)
2006-06-16 13:21:27,839 INFO  PluginRepository -        Site Query Filter
(query-site)
2006-06-16 13:21:27,839 INFO  PluginRepository -        Html Parse Plug-in
(parse-html)
2006-06-16 13:21:27,839 INFO  PluginRepository -        Regex URL Filter
Framework (lib-regex-filter)
2006-06-16 13:21:27,839 INFO  PluginRepository -        Basic Indexing
Filter (index-basic)
2006-06-16 13:21:27,839 INFO  PluginRepository -        Basic Summarizer
Plug-in (summary-basic)
2006-06-16 13:21:27,839 INFO  PluginRepository -        Text Parse Plug-in
(parse-text)
2006-06-16 13:21:27,839 INFO  PluginRepository -        JavaScript Parser
(parse-js)
2006-06-16 13:21:27,839 INFO  PluginRepository -        Regex URL Filter
(urlfilter-regex)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Basic Query Filter
(query-basic)
2006-06-16 13:21:27,839 INFO  PluginRepository -        Text Parse Plug-in
(parse-text)
2006-06-16 13:21:27,839 INFO  PluginRepository -        JavaScript Parser
(parse-js)
2006-06-16 13:21:27,839 INFO  PluginRepository -        Regex URL Filter
(urlfilter-regex)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Basic Query Filter
(query-basic)
2006-06-16 13:21:27,840 INFO  PluginRepository -        HTTP Framework
(lib-http)
2006-06-16 13:21:27,840 INFO  PluginRepository -        URL Query Filter
(query-url)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Http Protocol
Plug-in (protocol-http)
2006-06-16 13:21:27,840 INFO  PluginRepository -        the nutch core
extension points (nutch-extensionpoints)
2006-06-16 13:21:27,840 INFO  PluginRepository -        OPIC Scoring Plug-in
(scoring-opic)
2006-06-16 13:21:27,840 INFO  PluginRepository - Registered
Extension-Points:
2006-06-16 13:21:27,840 INFO  PluginRepository -        Nutch Summarizer (
org.apache.nutch.searcher.Summarizer)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Nutch Scoring (
org.apache.nutch.scoring.ScoringFilter)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Nutch Protocol (
org.apache.nutch.protocol.Protocol)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Nutch URL Filter (
org.apache.nutch.net.URLFilter)
2006-06-16 13:21:27,840 INFO  PluginRepository -        HTML Parse Filter (
org.apache.nutch.parse.HtmlParseFilter)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Nutch Online Search
Results Clustering Plugin ( org.apache.nutch.clustering.OnlineClusterer)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Nutch Indexing
Filter (org.apache.nutch.indexer.IndexingFilter)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Nutch Content Parser
( org.apache.nutch.parse.Parser)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Ontology Model
Loader (org.apache.nutch.ontology.Ontology)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Nutch Analysis (
org.apache.nutch.analysis.NutchAnalyzer)
2006-06-16 13:21:27,840 INFO  PluginRepository -        Nutch Query Filter (
org.apache.nutch.searcher.QueryFilter)
2006-06-16 13:21:27,874 INFO  NutchBean - creating new bean
2006-06-16 13:21:27,892 INFO  NutchBean - opening indexes in
/root/crawl/indexes
2006-06-16 13:21:28,000 INFO  Configuration - found resource
common-terms.utf8 at
file:/var/lib/tomcat5/webapps/ROOT/WEB-INF/classes/common- terms.utf8
2006-06-16 13:21:28,016 INFO  NutchBean - opening segments in
/root/crawl/segments
2006-06-16 13:21:28,042 INFO  SummarizerFactory - Using the first summarizer
extension found: Basic Summarizer
2006-06-16 13:21:28,055 INFO  NutchBean - opening linkdb in
/root/crawl/linkdb
2006-06-16 13:21:28,063 INFO  NutchBean - query request from 127.0.0.1
2006-06-16 13:21:28,095 INFO  NutchBean - query: apache
2006-06-16 13:21:28,096 INFO  NutchBean - lang: en
2006-06-16 13:21:28,279 INFO  NutchBean - searching for 20 raw hits
2006-06-16 13:21:28,373 INFO  NutchBean - total hits: 0
2006-06-16 13:27:24,024 INFO  NutchBean - query request from 127.0.0.1
2006-06-16 13:27:24,024 INFO  NutchBean - query: forrest
2006-06-16 13:27:24,024 INFO  NutchBean - lang: en
2006-06-16 13:27:24,026 INFO  NutchBean - searching for 20 raw hits
2006-06-16 13:27:24,027 INFO  NutchBean - total hits: 0




--
Chris Newton
CTO Radian6
506-452-9039


--
Chris Newton
CTO Radian6
506-452-9039