Problems Searching an Index with Nutch

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Problems Searching an Index with Nutch

Erik Höschler
Hi,

I'm running Nutch-0.7.2. I created an Index for my local Lan which
consists of 45.000 Pages.
I can inspect this Index with Luke an everything looks fine. When I try
to start a search Query with Nutch
I can see the following Exception in my JBOSS Logfile (at the End of the
Log).


//Here I'm redploying the Nutch.war Archive....
2007-01-26 15:55:06,611 INFO  [org.jboss.web.tomcat.tc5.TomcatDeployer]
deploy, ctxPath=/nutch,
warUrl=file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/
2007-01-26 15:55:06,831 DEBUG [tomcat.localhost./nutch.Context] Starting
tomcat.localhost./nutch.Context
2007-01-26 15:55:06,832 DEBUG [tomcat.localhost./nutch.Context]
Configuring default Resources
2007-01-26 15:55:06,836 DEBUG [tomcat.localhost./nutch.Context]
Processing standard container startup
2007-01-26 15:55:06,844 DEBUG [tomcat.localhost./nutch.Context] Setting
deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
Application 2.3//EN'
2007-01-26 15:55:06,862 DEBUG [tomcat.localhost./nutch.Context] Setting
deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
Application 2.3//EN'
2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Posting
standard context attributes
2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]
Configuring application event listeners
2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Sending
application start events
2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Starting
filters
2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]  
Starting filter 'CommonHeadersFilter'
2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Starting
completed //Archive successfully loaded...?!?!
2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Checking
for
jboss.web:j2eeType=WebModule,name=//localhost/nutch,J2EEApplication=none,J2EEServer=none


//Here I startet a query in my Webbrowser...
2007-01-26 15:55:53,585 INFO  [STDOUT] 070126 155553 parsing
file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/nutch-default.xml
2007-01-26 15:55:53,591 INFO  [STDOUT] 070126 155553 parsing
file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/nutch-site.xml
2007-01-26 15:55:53,599 INFO  [STDOUT] 070126 155553 Plugins: looking
in:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins
2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/clustering-carrot2
2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/creativecommons
2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/index-basic/plugin.xml
2007-01-26 15:55:53,607 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.indexer.IndexingFilter
class=org.apache.nutch.indexer.basic.BasicIndexingFilter
2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/index-more
2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/language-identifier
2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/nutch-extensionpoints/plugin.xml
2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/ontology
2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/parse-ext
2007-01-26 15:55:53,613 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/parse-html/plugin.xml
2007-01-26 15:55:53,614 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.html.HtmlParser
2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/parse-js
2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/parse-msword
2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/parse-pdf
2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/parse-rss
2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/parse-text/plugin.xml
2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.text.TextParser
2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/protocol-file
2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/protocol-ftp
2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/protocol-http/plugin.xml
2007-01-26 15:55:53,619 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.http.Http
2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/protocol-httpclient
2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/query-basic/plugin.xml
2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.basic.BasicQueryFilter
2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/query-more
2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/query-site/plugin.xml
2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.site.SiteQueryFilter
2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/query-url/plugin.xml
2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.url.URLQueryFilter
2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/urlfilter-prefix
2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/classes/plugins/urlfilter-regex/plugin.xml
2007-01-26 15:55:53,628 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.net.URLFilter
class=org.apache.nutch.net.RegexURLFilter
2007-01-26 15:55:53,639 INFO  [STDOUT] 070126 155553 10 creating new bean
2007-01-26 15:55:53,640 INFO  [STDOUT] 070126 155553 10 opening segment
indexes in /srv/opt/nutch-0.7.2/crawl.db/segments
2007-01-26 15:55:53,652 ERROR [org.jboss.web.localhost.Engine]
StandardWrapperValve[jsp]: Servlet.service() for servlet jsp threw exception
java.lang.ArrayIndexOutOfBoundsException



In my Browser i got the following Error ...


  HTTP Status 500 -

------------------------------------------------------------------------

*type* Exception report

*message*

*description* _The server encountered an internal error () that
prevented it from fulfilling this request._

*exception*

org.apache.jasper.JasperException
        org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:372)
        org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292)
        org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
        org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:75)

*root cause*

java.lang.ArrayIndexOutOfBoundsException

*note* _The full stack trace of the root cause is available in the
Apache Tomcat/5.0.28 logs._

------------------------------------------------------------------------


      Apache Tomcat/5.0.28



I also tested this Search on a newly created Index ( a small one ) but
got the same error. I Also tried to run Nutch-0.8.1 but still the same.
Also I couldn't find any information about this error and now I don't
know what to do. Maybe you have got a idea...

Thanks in advance...

Yours sincerely,
Erik H.
Reply | Threaded
Open this post in threaded view
|

RE: Problems Searching an Index with Nutch

Gal Nitzan
Hi,

I'm not sure but it seems to me you are missing the linkdb and segments
folder. It should be located on the same level as the index folder.

HTH/

Gal

-----Original Message-----
From: Erik Höschler [mailto:[hidden email]]
Sent: Friday, January 26, 2007 5:04 PM
To: [hidden email]
Cc: Erik
Subject: Problems Searching an Index with Nutch

Hi,

I'm running Nutch-0.7.2. I created an Index for my local Lan which
consists of 45.000 Pages.
I can inspect this Index with Luke an everything looks fine. When I try
to start a search Query with Nutch
I can see the following Exception in my JBOSS Logfile (at the End of the
Log).


//Here I'm redploying the Nutch.war Archive....
2007-01-26 15:55:06,611 INFO  [org.jboss.web.tomcat.tc5.TomcatDeployer]
deploy, ctxPath=/nutch,
warUrl=file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/
2007-01-26 15:55:06,831 DEBUG [tomcat.localhost./nutch.Context] Starting
tomcat.localhost./nutch.Context
2007-01-26 15:55:06,832 DEBUG [tomcat.localhost./nutch.Context]
Configuring default Resources
2007-01-26 15:55:06,836 DEBUG [tomcat.localhost./nutch.Context]
Processing standard container startup
2007-01-26 15:55:06,844 DEBUG [tomcat.localhost./nutch.Context] Setting
deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
Application 2.3//EN'
2007-01-26 15:55:06,862 DEBUG [tomcat.localhost./nutch.Context] Setting
deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
Application 2.3//EN'
2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Posting
standard context attributes
2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]
Configuring application event listeners
2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Sending
application start events
2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Starting
filters
2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]  
Starting filter 'CommonHeadersFilter'
2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Starting
completed //Archive successfully loaded...?!?!
2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Checking
for
jboss.web:j2eeType=WebModule,name=//localhost/nutch,J2EEApplication=none,J2E
EServer=none


//Here I startet a query in my Webbrowser...
2007-01-26 15:55:53,585 INFO  [STDOUT] 070126 155553 parsing
file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
/classes/nutch-default.xml
2007-01-26 15:55:53,591 INFO  [STDOUT] 070126 155553 parsing
file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
/classes/nutch-site.xml
2007-01-26 15:55:53,599 INFO  [STDOUT] 070126 155553 Plugins: looking
in:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins
2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/clustering-carrot2
2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/creativecommons
2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/index-basic/plugin.xml
2007-01-26 15:55:53,607 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.indexer.IndexingFilter
class=org.apache.nutch.indexer.basic.BasicIndexingFilter
2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/index-more
2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/language-identifier
2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/nutch-extensionpoints/plugin.xml
2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/ontology
2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-ext
2007-01-26 15:55:53,613 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-html/plugin.xml
2007-01-26 15:55:53,614 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.html.HtmlParser
2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-js
2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-msword
2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-pdf
2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-rss
2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/parse-text/plugin.xml
2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.text.TextParser
2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/protocol-file
2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/protocol-ftp
2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/protocol-http/plugin.xml
2007-01-26 15:55:53,619 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.http.Http
2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/protocol-httpclient
2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/query-basic/plugin.xml
2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.basic.BasicQueryFilter
2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/query-more
2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/query-site/plugin.xml
2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.site.SiteQueryFilter
2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/query-url/plugin.xml
2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.url.URLQueryFilter
2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 not including:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/urlfilter-prefix
2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 parsing:
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
ses/plugins/urlfilter-regex/plugin.xml
2007-01-26 15:55:53,628 INFO  [STDOUT] 070126 155553 impl:
point=org.apache.nutch.net.URLFilter
class=org.apache.nutch.net.RegexURLFilter
2007-01-26 15:55:53,639 INFO  [STDOUT] 070126 155553 10 creating new bean
2007-01-26 15:55:53,640 INFO  [STDOUT] 070126 155553 10 opening segment
indexes in /srv/opt/nutch-0.7.2/crawl.db/segments
2007-01-26 15:55:53,652 ERROR [org.jboss.web.localhost.Engine]
StandardWrapperValve[jsp]: Servlet.service() for servlet jsp threw exception
java.lang.ArrayIndexOutOfBoundsException



In my Browser i got the following Error ...


  HTTP Status 500 -

------------------------------------------------------------------------

*type* Exception report

*message*

*description* _The server encountered an internal error () that
prevented it from fulfilling this request._

*exception*

org.apache.jasper.JasperException
       
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
72)
       
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292)
        org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
       
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.ja
va:75)

*root cause*

java.lang.ArrayIndexOutOfBoundsException

*note* _The full stack trace of the root cause is available in the
Apache Tomcat/5.0.28 logs._

------------------------------------------------------------------------


      Apache Tomcat/5.0.28



I also tested this Search on a newly created Index ( a small one ) but
got the same error. I Also tried to run Nutch-0.8.1 but still the same.
Also I couldn't find any information about this error and now I don't
know what to do. Maybe you have got a idea...

Thanks in advance...

Yours sincerely,
Erik H.


Reply | Threaded
Open this post in threaded view
|

Re: Problems Searching an Index with Nutch

Erik Höschler
Hi,

I checked my FolderStructure and everything seems to be correct...

:/opt/nutch/crawl.db# l
insgesamt 8
drwxr-xr-x   3 root root   53 2007-01-19 14:11 db
drwxr-xr-x   2 root root 4096 2007-01-19 14:18 index
drwxr-xr-x  12 root root 4096 2007-01-26 15:06 segments

I'm not sure if I've ever had a linkdb Folder or did you mean the db
folder listed above?

Greetings,
Erik

Gal Nitzan schrieb:

> Hi,
>
> I'm not sure but it seems to me you are missing the linkdb and segments
> folder. It should be located on the same level as the index folder.
>
> HTH/
>
> Gal
>
> -----Original Message-----
> From: Erik Höschler [mailto:[hidden email]]
> Sent: Friday, January 26, 2007 5:04 PM
> To: [hidden email]
> Cc: Erik
> Subject: Problems Searching an Index with Nutch
>
> Hi,
>
> I'm running Nutch-0.7.2. I created an Index for my local Lan which
> consists of 45.000 Pages.
> I can inspect this Index with Luke an everything looks fine. When I try
> to start a search Query with Nutch
> I can see the following Exception in my JBOSS Logfile (at the End of the
> Log).
>
>
> //Here I'm redploying the Nutch.war Archive....
> 2007-01-26 15:55:06,611 INFO  [org.jboss.web.tomcat.tc5.TomcatDeployer]
> deploy, ctxPath=/nutch,
> warUrl=file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/
> 2007-01-26 15:55:06,831 DEBUG [tomcat.localhost./nutch.Context] Starting
> tomcat.localhost./nutch.Context
> 2007-01-26 15:55:06,832 DEBUG [tomcat.localhost./nutch.Context]
> Configuring default Resources
> 2007-01-26 15:55:06,836 DEBUG [tomcat.localhost./nutch.Context]
> Processing standard container startup
> 2007-01-26 15:55:06,844 DEBUG [tomcat.localhost./nutch.Context] Setting
> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
> Application 2.3//EN'
> 2007-01-26 15:55:06,862 DEBUG [tomcat.localhost./nutch.Context] Setting
> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
> Application 2.3//EN'
> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Posting
> standard context attributes
> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]
> Configuring application event listeners
> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Sending
> application start events
> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Starting
> filters
> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]  
> Starting filter 'CommonHeadersFilter'
> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Starting
> completed //Archive successfully loaded...?!?!
> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Checking
> for
> jboss.web:j2eeType=WebModule,name=//localhost/nutch,J2EEApplication=none,J2E
> EServer=none
>
>
> //Here I startet a query in my Webbrowser...
> 2007-01-26 15:55:53,585 INFO  [STDOUT] 070126 155553 parsing
> file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
> /classes/nutch-default.xml
> 2007-01-26 15:55:53,591 INFO  [STDOUT] 070126 155553 parsing
> file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
> /classes/nutch-site.xml
> 2007-01-26 15:55:53,599 INFO  [STDOUT] 070126 155553 Plugins: looking
> in:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins
> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/clustering-carrot2
> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/creativecommons
> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 parsing:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/index-basic/plugin.xml
> 2007-01-26 15:55:53,607 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.indexer.IndexingFilter
> class=org.apache.nutch.indexer.basic.BasicIndexingFilter
> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/index-more
> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/language-identifier
> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 parsing:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/nutch-extensionpoints/plugin.xml
> 2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/ontology
> 2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-ext
> 2007-01-26 15:55:53,613 INFO  [STDOUT] 070126 155553 parsing:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-html/plugin.xml
> 2007-01-26 15:55:53,614 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.parse.Parser
> class=org.apache.nutch.parse.html.HtmlParser
> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-js
> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-msword
> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-pdf
> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-rss
> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 parsing:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-text/plugin.xml
> 2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.parse.Parser
> class=org.apache.nutch.parse.text.TextParser
> 2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/protocol-file
> 2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/protocol-ftp
> 2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 parsing:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/protocol-http/plugin.xml
> 2007-01-26 15:55:53,619 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.protocol.Protocol
> class=org.apache.nutch.protocol.http.Http
> 2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/protocol-httpclient
> 2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 parsing:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/query-basic/plugin.xml
> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.searcher.QueryFilter
> class=org.apache.nutch.searcher.basic.BasicQueryFilter
> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/query-more
> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 parsing:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/query-site/plugin.xml
> 2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.searcher.QueryFilter
> class=org.apache.nutch.searcher.site.SiteQueryFilter
> 2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 parsing:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/query-url/plugin.xml
> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.searcher.QueryFilter
> class=org.apache.nutch.searcher.url.URLQueryFilter
> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 not including:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/urlfilter-prefix
> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 parsing:
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/urlfilter-regex/plugin.xml
> 2007-01-26 15:55:53,628 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.net.URLFilter
> class=org.apache.nutch.net.RegexURLFilter
> 2007-01-26 15:55:53,639 INFO  [STDOUT] 070126 155553 10 creating new bean
> 2007-01-26 15:55:53,640 INFO  [STDOUT] 070126 155553 10 opening segment
> indexes in /srv/opt/nutch-0.7.2/crawl.db/segments
> 2007-01-26 15:55:53,652 ERROR [org.jboss.web.localhost.Engine]
> StandardWrapperValve[jsp]: Servlet.service() for servlet jsp threw exception
> java.lang.ArrayIndexOutOfBoundsException
>
>
>
> In my Browser i got the following Error ...
>
>
>   HTTP Status 500 -
>
> ------------------------------------------------------------------------
>
> *type* Exception report
>
> *message*
>
> *description* _The server encountered an internal error () that
> prevented it from fulfilling this request._
>
> *exception*
>
> org.apache.jasper.JasperException
>
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
> 72)
>
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292)
> org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236)
> javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
>
> org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.ja
> va:75)
>
> *root cause*
>
> java.lang.ArrayIndexOutOfBoundsException
>
> *note* _The full stack trace of the root cause is available in the
> Apache Tomcat/5.0.28 logs._
>
> ------------------------------------------------------------------------
>
>
>       Apache Tomcat/5.0.28
>
>
>
> I also tested this Search on a newly created Index ( a small one ) but
> got the same error. I Also tried to run Nutch-0.8.1 but still the same.
> Also I couldn't find any information about this error and now I don't
> know what to do. Maybe you have got a idea...
>
> Thanks in advance...
>
> Yours sincerely,
> Erik H.
>
>
>  

Reply | Threaded
Open this post in threaded view
|

RE: Problems Searching an Index with Nutch

Gal Nitzan
Well I guess that db is linkdb for ver 0.7 .

Any way there is not much info maybe you can find more info in the
Catalina.out ...

One more thing to look for just maybe it is the reason (long shut)... check
each of your segment folders and verify that it contains all the 5 folders
i.e. content,crawl_generate,crawl_parse,parse_data,parse_text

HTH

Gal.

-----Original Message-----
From: Erik Höschler [mailto:[hidden email]]
Sent: Friday, January 26, 2007 5:58 PM
To: [hidden email]
Subject: Re: Problems Searching an Index with Nutch

Hi,

I checked my FolderStructure and everything seems to be correct...

:/opt/nutch/crawl.db# l
insgesamt 8
drwxr-xr-x   3 root root   53 2007-01-19 14:11 db
drwxr-xr-x   2 root root 4096 2007-01-19 14:18 index
drwxr-xr-x  12 root root 4096 2007-01-26 15:06 segments

I'm not sure if I've ever had a linkdb Folder or did you mean the db
folder listed above?

Greetings,
Erik

Gal Nitzan schrieb:

> Hi,
>
> I'm not sure but it seems to me you are missing the linkdb and segments
> folder. It should be located on the same level as the index folder.
>
> HTH/
>
> Gal
>
> -----Original Message-----
> From: Erik Höschler [mailto:[hidden email]]
> Sent: Friday, January 26, 2007 5:04 PM
> To: [hidden email]
> Cc: Erik
> Subject: Problems Searching an Index with Nutch
>
> Hi,
>
> I'm running Nutch-0.7.2. I created an Index for my local Lan which
> consists of 45.000 Pages.
> I can inspect this Index with Luke an everything looks fine. When I try
> to start a search Query with Nutch
> I can see the following Exception in my JBOSS Logfile (at the End of the
> Log).
>
>
> //Here I'm redploying the Nutch.war Archive....
> 2007-01-26 15:55:06,611 INFO  [org.jboss.web.tomcat.tc5.TomcatDeployer]
> deploy, ctxPath=/nutch,
>
warUrl=file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/

> 2007-01-26 15:55:06,831 DEBUG [tomcat.localhost./nutch.Context] Starting
> tomcat.localhost./nutch.Context
> 2007-01-26 15:55:06,832 DEBUG [tomcat.localhost./nutch.Context]
> Configuring default Resources
> 2007-01-26 15:55:06,836 DEBUG [tomcat.localhost./nutch.Context]
> Processing standard container startup
> 2007-01-26 15:55:06,844 DEBUG [tomcat.localhost./nutch.Context] Setting
> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
> Application 2.3//EN'
> 2007-01-26 15:55:06,862 DEBUG [tomcat.localhost./nutch.Context] Setting
> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
> Application 2.3//EN'
> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Posting
> standard context attributes
> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]
> Configuring application event listeners
> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Sending
> application start events
> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Starting
> filters
> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]  
> Starting filter 'CommonHeadersFilter'
> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Starting
> completed //Archive successfully loaded...?!?!
> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Checking
> for
>
jboss.web:j2eeType=WebModule,name=//localhost/nutch,J2EEApplication=none,J2E
> EServer=none
>
>
> //Here I startet a query in my Webbrowser...
> 2007-01-26 15:55:53,585 INFO  [STDOUT] 070126 155553 parsing
>
file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
> /classes/nutch-default.xml
> 2007-01-26 15:55:53,591 INFO  [STDOUT] 070126 155553 parsing
>
file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
> /classes/nutch-site.xml
> 2007-01-26 15:55:53,599 INFO  [STDOUT] 070126 155553 Plugins: looking
> in:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins
> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/clustering-carrot2
> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/creativecommons
> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 parsing:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/index-basic/plugin.xml
> 2007-01-26 15:55:53,607 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.indexer.IndexingFilter
> class=org.apache.nutch.indexer.basic.BasicIndexingFilter
> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/index-more
> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/language-identifier
> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 parsing:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/nutch-extensionpoints/plugin.xml
> 2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/ontology
> 2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-ext
> 2007-01-26 15:55:53,613 INFO  [STDOUT] 070126 155553 parsing:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-html/plugin.xml
> 2007-01-26 15:55:53,614 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.parse.Parser
> class=org.apache.nutch.parse.html.HtmlParser
> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-js
> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-msword
> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-pdf
> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-rss
> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 parsing:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/parse-text/plugin.xml
> 2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.parse.Parser
> class=org.apache.nutch.parse.text.TextParser
> 2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/protocol-file
> 2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/protocol-ftp
> 2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 parsing:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/protocol-http/plugin.xml
> 2007-01-26 15:55:53,619 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.protocol.Protocol
> class=org.apache.nutch.protocol.http.Http
> 2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/protocol-httpclient
> 2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 parsing:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/query-basic/plugin.xml
> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.searcher.QueryFilter
> class=org.apache.nutch.searcher.basic.BasicQueryFilter
> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/query-more
> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 parsing:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/query-site/plugin.xml
> 2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.searcher.QueryFilter
> class=org.apache.nutch.searcher.site.SiteQueryFilter
> 2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 parsing:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/query-url/plugin.xml
> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.searcher.QueryFilter
> class=org.apache.nutch.searcher.url.URLQueryFilter
> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 not including:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/urlfilter-prefix
> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 parsing:
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
> ses/plugins/urlfilter-regex/plugin.xml
> 2007-01-26 15:55:53,628 INFO  [STDOUT] 070126 155553 impl:
> point=org.apache.nutch.net.URLFilter
> class=org.apache.nutch.net.RegexURLFilter
> 2007-01-26 15:55:53,639 INFO  [STDOUT] 070126 155553 10 creating new bean
> 2007-01-26 15:55:53,640 INFO  [STDOUT] 070126 155553 10 opening segment
> indexes in /srv/opt/nutch-0.7.2/crawl.db/segments
> 2007-01-26 15:55:53,652 ERROR [org.jboss.web.localhost.Engine]
> StandardWrapperValve[jsp]: Servlet.service() for servlet jsp threw
exception

> java.lang.ArrayIndexOutOfBoundsException
>
>
>
> In my Browser i got the following Error ...
>
>
>   HTTP Status 500 -
>
> ------------------------------------------------------------------------
>
> *type* Exception report
>
> *message*
>
> *description* _The server encountered an internal error () that
> prevented it from fulfilling this request._
>
> *exception*
>
> org.apache.jasper.JasperException
>
>
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
> 72)
>
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292)
> org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236)
> javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
>
>
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.ja

> va:75)
>
> *root cause*
>
> java.lang.ArrayIndexOutOfBoundsException
>
> *note* _The full stack trace of the root cause is available in the
> Apache Tomcat/5.0.28 logs._
>
> ------------------------------------------------------------------------
>
>
>       Apache Tomcat/5.0.28
>
>
>
> I also tested this Search on a newly created Index ( a small one ) but
> got the same error. I Also tried to run Nutch-0.8.1 but still the same.
> Also I couldn't find any information about this error and now I don't
> know what to do. Maybe you have got a idea...
>
> Thanks in advance...
>
> Yours sincerely,
> Erik H.
>
>
>  



Reply | Threaded
Open this post in threaded view
|

Re: Problems Searching an Index with Nutch

Erik Höschler
Ok,

I could not find any crawl_generate or crawl_parse Folder. Also I didn't
find Catalina.out on my whole System?!?!

One thing I won't understand is the fact that nutch should create my
folder structure. If there is a fault in it, just like
the missing folders or the 'db' folder which should normally be
'linkdb', how can I fix this. I didn't change anything at
the structure by my own so it must have been created by nutch
directly... Any idea how this could happen?

Thanks for your time ;)

--Erik

Gal Nitzan schrieb:

> Well I guess that db is linkdb for ver 0.7 .
>
> Any way there is not much info maybe you can find more info in the
> Catalina.out ...
>
> One more thing to look for just maybe it is the reason (long shut)... check
> each of your segment folders and verify that it contains all the 5 folders
> i.e. content,crawl_generate,crawl_parse,parse_data,parse_text
>
> HTH
>
> Gal.
>
> -----Original Message-----
> From: Erik Höschler [mailto:[hidden email]]
> Sent: Friday, January 26, 2007 5:58 PM
> To: [hidden email]
> Subject: Re: Problems Searching an Index with Nutch
>
> Hi,
>
> I checked my FolderStructure and everything seems to be correct...
>
> :/opt/nutch/crawl.db# l
> insgesamt 8
> drwxr-xr-x   3 root root   53 2007-01-19 14:11 db
> drwxr-xr-x   2 root root 4096 2007-01-19 14:18 index
> drwxr-xr-x  12 root root 4096 2007-01-26 15:06 segments
>
> I'm not sure if I've ever had a linkdb Folder or did you mean the db
> folder listed above?
>
> Greetings,
> Erik
>
> Gal Nitzan schrieb:
>  
>> Hi,
>>
>> I'm not sure but it seems to me you are missing the linkdb and segments
>> folder. It should be located on the same level as the index folder.
>>
>> HTH/
>>
>> Gal
>>
>> -----Original Message-----
>> From: Erik Höschler [mailto:[hidden email]]
>> Sent: Friday, January 26, 2007 5:04 PM
>> To: [hidden email]
>> Cc: Erik
>> Subject: Problems Searching an Index with Nutch
>>
>> Hi,
>>
>> I'm running Nutch-0.7.2. I created an Index for my local Lan which
>> consists of 45.000 Pages.
>> I can inspect this Index with Luke an everything looks fine. When I try
>> to start a search Query with Nutch
>> I can see the following Exception in my JBOSS Logfile (at the End of the
>> Log).
>>
>>
>> //Here I'm redploying the Nutch.war Archive....
>> 2007-01-26 15:55:06,611 INFO  [org.jboss.web.tomcat.tc5.TomcatDeployer]
>> deploy, ctxPath=/nutch,
>>
>>    
> warUrl=file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/
>  
>> 2007-01-26 15:55:06,831 DEBUG [tomcat.localhost./nutch.Context] Starting
>> tomcat.localhost./nutch.Context
>> 2007-01-26 15:55:06,832 DEBUG [tomcat.localhost./nutch.Context]
>> Configuring default Resources
>> 2007-01-26 15:55:06,836 DEBUG [tomcat.localhost./nutch.Context]
>> Processing standard container startup
>> 2007-01-26 15:55:06,844 DEBUG [tomcat.localhost./nutch.Context] Setting
>> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
>> Application 2.3//EN'
>> 2007-01-26 15:55:06,862 DEBUG [tomcat.localhost./nutch.Context] Setting
>> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
>> Application 2.3//EN'
>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Posting
>> standard context attributes
>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]
>> Configuring application event listeners
>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Sending
>> application start events
>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Starting
>> filters
>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]  
>> Starting filter 'CommonHeadersFilter'
>> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Starting
>> completed //Archive successfully loaded...?!?!
>> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Checking
>> for
>>
>>    
> jboss.web:j2eeType=WebModule,name=//localhost/nutch,J2EEApplication=none,J2E
>  
>> EServer=none
>>
>>
>> //Here I startet a query in my Webbrowser...
>> 2007-01-26 15:55:53,585 INFO  [STDOUT] 070126 155553 parsing
>>
>>    
> file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
>  
>> /classes/nutch-default.xml
>> 2007-01-26 15:55:53,591 INFO  [STDOUT] 070126 155553 parsing
>>
>>    
> file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
>  
>> /classes/nutch-site.xml
>> 2007-01-26 15:55:53,599 INFO  [STDOUT] 070126 155553 Plugins: looking
>> in:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins
>> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/clustering-carrot2
>> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/creativecommons
>> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/index-basic/plugin.xml
>> 2007-01-26 15:55:53,607 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.indexer.IndexingFilter
>> class=org.apache.nutch.indexer.basic.BasicIndexingFilter
>> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/index-more
>> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/language-identifier
>> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/nutch-extensionpoints/plugin.xml
>> 2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/ontology
>> 2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-ext
>> 2007-01-26 15:55:53,613 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-html/plugin.xml
>> 2007-01-26 15:55:53,614 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.parse.Parser
>> class=org.apache.nutch.parse.html.HtmlParser
>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-js
>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-msword
>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-pdf
>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-rss
>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-text/plugin.xml
>> 2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.parse.Parser
>> class=org.apache.nutch.parse.text.TextParser
>> 2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/protocol-file
>> 2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/protocol-ftp
>> 2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/protocol-http/plugin.xml
>> 2007-01-26 15:55:53,619 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.protocol.Protocol
>> class=org.apache.nutch.protocol.http.Http
>> 2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/protocol-httpclient
>> 2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/query-basic/plugin.xml
>> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.searcher.QueryFilter
>> class=org.apache.nutch.searcher.basic.BasicQueryFilter
>> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/query-more
>> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/query-site/plugin.xml
>> 2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.searcher.QueryFilter
>> class=org.apache.nutch.searcher.site.SiteQueryFilter
>> 2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/query-url/plugin.xml
>> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.searcher.QueryFilter
>> class=org.apache.nutch.searcher.url.URLQueryFilter
>> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/urlfilter-prefix
>> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/urlfilter-regex/plugin.xml
>> 2007-01-26 15:55:53,628 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.net.URLFilter
>> class=org.apache.nutch.net.RegexURLFilter
>> 2007-01-26 15:55:53,639 INFO  [STDOUT] 070126 155553 10 creating new bean
>> 2007-01-26 15:55:53,640 INFO  [STDOUT] 070126 155553 10 opening segment
>> indexes in /srv/opt/nutch-0.7.2/crawl.db/segments
>> 2007-01-26 15:55:53,652 ERROR [org.jboss.web.localhost.Engine]
>> StandardWrapperValve[jsp]: Servlet.service() for servlet jsp threw
>>    
> exception
>  
>> java.lang.ArrayIndexOutOfBoundsException
>>
>>
>>
>> In my Browser i got the following Error ...
>>
>>
>>   HTTP Status 500 -
>>
>> ------------------------------------------------------------------------
>>
>> *type* Exception report
>>
>> *message*
>>
>> *description* _The server encountered an internal error () that
>> prevented it from fulfilling this request._
>>
>> *exception*
>>
>> org.apache.jasper.JasperException
>>
>>
>>    
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
>  
>> 72)
>>
>> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292)
>> org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236)
>> javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
>>
>>
>>    
> org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.ja
>  
>> va:75)
>>
>> *root cause*
>>
>> java.lang.ArrayIndexOutOfBoundsException
>>
>> *note* _The full stack trace of the root cause is available in the
>> Apache Tomcat/5.0.28 logs._
>>
>> ------------------------------------------------------------------------
>>
>>
>>       Apache Tomcat/5.0.28
>>
>>
>>
>> I also tested this Search on a newly created Index ( a small one ) but
>> got the same error. I Also tried to run Nutch-0.8.1 but still the same.
>> Also I couldn't find any information about this error and now I don't
>> know what to do. Maybe you have got a idea...
>>
>> Thanks in advance...
>>
>> Yours sincerely,
>> Erik H.
>>
>>
>>  
>>    
>
>
>
>  

Reply | Threaded
Open this post in threaded view
|

RE: Problems Searching an Index with Nutch

Gal Nitzan
Erik,

I'm not sure cause' I worked with your version long time ago (work with 0.9)
so I'm not sure I'm right about the "crawl_generate and crawl_parse" folders
in the segment structure.

However, two days ago I had that same exception when one of my segments was
missing the parse folder in the segment.

So maybe you need to parse the segments again (bin/nutch parse
segments/segmentname)

HTH,

Gal.



-----Original Message-----
From: Erik Höschler [mailto:[hidden email]]
Sent: Friday, January 26, 2007 6:21 PM
To: [hidden email]
Subject: Re: Problems Searching an Index with Nutch

Ok,

I could not find any crawl_generate or crawl_parse Folder. Also I didn't
find Catalina.out on my whole System?!?!

One thing I won't understand is the fact that nutch should create my
folder structure. If there is a fault in it, just like
the missing folders or the 'db' folder which should normally be
'linkdb', how can I fix this. I didn't change anything at
the structure by my own so it must have been created by nutch
directly... Any idea how this could happen?

Thanks for your time ;)

--Erik

Gal Nitzan schrieb:

> Well I guess that db is linkdb for ver 0.7 .
>
> Any way there is not much info maybe you can find more info in the
> Catalina.out ...
>
> One more thing to look for just maybe it is the reason (long shut)...
check

> each of your segment folders and verify that it contains all the 5 folders
> i.e. content,crawl_generate,crawl_parse,parse_data,parse_text
>
> HTH
>
> Gal.
>
> -----Original Message-----
> From: Erik Höschler [mailto:[hidden email]]
> Sent: Friday, January 26, 2007 5:58 PM
> To: [hidden email]
> Subject: Re: Problems Searching an Index with Nutch
>
> Hi,
>
> I checked my FolderStructure and everything seems to be correct...
>
> :/opt/nutch/crawl.db# l
> insgesamt 8
> drwxr-xr-x   3 root root   53 2007-01-19 14:11 db
> drwxr-xr-x   2 root root 4096 2007-01-19 14:18 index
> drwxr-xr-x  12 root root 4096 2007-01-26 15:06 segments
>
> I'm not sure if I've ever had a linkdb Folder or did you mean the db
> folder listed above?
>
> Greetings,
> Erik
>
> Gal Nitzan schrieb:
>  
>> Hi,
>>
>> I'm not sure but it seems to me you are missing the linkdb and segments
>> folder. It should be located on the same level as the index folder.
>>
>> HTH/
>>
>> Gal
>>
>> -----Original Message-----
>> From: Erik Höschler [mailto:[hidden email]]
>> Sent: Friday, January 26, 2007 5:04 PM
>> To: [hidden email]
>> Cc: Erik
>> Subject: Problems Searching an Index with Nutch
>>
>> Hi,
>>
>> I'm running Nutch-0.7.2. I created an Index for my local Lan which
>> consists of 45.000 Pages.
>> I can inspect this Index with Luke an everything looks fine. When I try
>> to start a search Query with Nutch
>> I can see the following Exception in my JBOSS Logfile (at the End of the
>> Log).
>>
>>
>> //Here I'm redploying the Nutch.war Archive....
>> 2007-01-26 15:55:06,611 INFO  [org.jboss.web.tomcat.tc5.TomcatDeployer]
>> deploy, ctxPath=/nutch,
>>
>>    
>
warUrl=file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/

>  
>> 2007-01-26 15:55:06,831 DEBUG [tomcat.localhost./nutch.Context] Starting
>> tomcat.localhost./nutch.Context
>> 2007-01-26 15:55:06,832 DEBUG [tomcat.localhost./nutch.Context]
>> Configuring default Resources
>> 2007-01-26 15:55:06,836 DEBUG [tomcat.localhost./nutch.Context]
>> Processing standard container startup
>> 2007-01-26 15:55:06,844 DEBUG [tomcat.localhost./nutch.Context] Setting
>> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
>> Application 2.3//EN'
>> 2007-01-26 15:55:06,862 DEBUG [tomcat.localhost./nutch.Context] Setting
>> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
>> Application 2.3//EN'
>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Posting
>> standard context attributes
>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]
>> Configuring application event listeners
>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Sending
>> application start events
>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Starting
>> filters
>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]  
>> Starting filter 'CommonHeadersFilter'
>> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Starting
>> completed //Archive successfully loaded...?!?!
>> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Checking
>> for
>>
>>    
>
jboss.web:j2eeType=WebModule,name=//localhost/nutch,J2EEApplication=none,J2E
>  
>> EServer=none
>>
>>
>> //Here I startet a query in my Webbrowser...
>> 2007-01-26 15:55:53,585 INFO  [STDOUT] 070126 155553 parsing
>>
>>    
>
file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
>  
>> /classes/nutch-default.xml
>> 2007-01-26 15:55:53,591 INFO  [STDOUT] 070126 155553 parsing
>>
>>    
>
file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
>  
>> /classes/nutch-site.xml
>> 2007-01-26 15:55:53,599 INFO  [STDOUT] 070126 155553 Plugins: looking
>> in:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins
>> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/clustering-carrot2
>> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/creativecommons
>> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/index-basic/plugin.xml
>> 2007-01-26 15:55:53,607 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.indexer.IndexingFilter
>> class=org.apache.nutch.indexer.basic.BasicIndexingFilter
>> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/index-more
>> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/language-identifier
>> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/nutch-extensionpoints/plugin.xml
>> 2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/ontology
>> 2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-ext
>> 2007-01-26 15:55:53,613 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-html/plugin.xml
>> 2007-01-26 15:55:53,614 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.parse.Parser
>> class=org.apache.nutch.parse.html.HtmlParser
>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-js
>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-msword
>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-pdf
>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-rss
>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/parse-text/plugin.xml
>> 2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.parse.Parser
>> class=org.apache.nutch.parse.text.TextParser
>> 2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/protocol-file
>> 2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/protocol-ftp
>> 2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/protocol-http/plugin.xml
>> 2007-01-26 15:55:53,619 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.protocol.Protocol
>> class=org.apache.nutch.protocol.http.Http
>> 2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/protocol-httpclient
>> 2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/query-basic/plugin.xml
>> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.searcher.QueryFilter
>> class=org.apache.nutch.searcher.basic.BasicQueryFilter
>> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/query-more
>> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/query-site/plugin.xml
>> 2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.searcher.QueryFilter
>> class=org.apache.nutch.searcher.site.SiteQueryFilter
>> 2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/query-url/plugin.xml
>> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.searcher.QueryFilter
>> class=org.apache.nutch.searcher.url.URLQueryFilter
>> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 not including:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>> ses/plugins/urlfilter-prefix
>> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 parsing:
>>
>>    
>
/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas

>  
>> ses/plugins/urlfilter-regex/plugin.xml
>> 2007-01-26 15:55:53,628 INFO  [STDOUT] 070126 155553 impl:
>> point=org.apache.nutch.net.URLFilter
>> class=org.apache.nutch.net.RegexURLFilter
>> 2007-01-26 15:55:53,639 INFO  [STDOUT] 070126 155553 10 creating new bean
>> 2007-01-26 15:55:53,640 INFO  [STDOUT] 070126 155553 10 opening segment
>> indexes in /srv/opt/nutch-0.7.2/crawl.db/segments
>> 2007-01-26 15:55:53,652 ERROR [org.jboss.web.localhost.Engine]
>> StandardWrapperValve[jsp]: Servlet.service() for servlet jsp threw
>>    
> exception
>  
>> java.lang.ArrayIndexOutOfBoundsException
>>
>>
>>
>> In my Browser i got the following Error ...
>>
>>
>>   HTTP Status 500 -
>>
>> ------------------------------------------------------------------------
>>
>> *type* Exception report
>>
>> *message*
>>
>> *description* _The server encountered an internal error () that
>> prevented it from fulfilling this request._
>>
>> *exception*
>>
>> org.apache.jasper.JasperException
>>
>>
>>    
>
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3

>  
>> 72)
>>
>> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292)
>> org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236)
>> javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
>>
>>
>>    
>
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.ja

>  
>> va:75)
>>
>> *root cause*
>>
>> java.lang.ArrayIndexOutOfBoundsException
>>
>> *note* _The full stack trace of the root cause is available in the
>> Apache Tomcat/5.0.28 logs._
>>
>> ------------------------------------------------------------------------
>>
>>
>>       Apache Tomcat/5.0.28
>>
>>
>>
>> I also tested this Search on a newly created Index ( a small one ) but
>> got the same error. I Also tried to run Nutch-0.8.1 but still the same.
>> Also I couldn't find any information about this error and now I don't
>> know what to do. Maybe you have got a idea...
>>
>> Thanks in advance...
>>
>> Yours sincerely,
>> Erik H.
>>
>>
>>  
>>    
>
>
>
>  



Reply | Threaded
Open this post in threaded view
|

Re: Problems Searching an Index with Nutch

Erik Höschler
Alright, I'll try next time I'm at work (would be next Friday cause I'm
just a student worker).
Thanks for your great help ;)

Regards,
-- Erik H.


Gal Nitzan schrieb:

> Erik,
>
> I'm not sure cause' I worked with your version long time ago (work with 0.9)
> so I'm not sure I'm right about the "crawl_generate and crawl_parse" folders
> in the segment structure.
>
> However, two days ago I had that same exception when one of my segments was
> missing the parse folder in the segment.
>
> So maybe you need to parse the segments again (bin/nutch parse
> segments/segmentname)
>
> HTH,
>
> Gal.
>
>
>
> -----Original Message-----
> From: Erik Höschler [mailto:[hidden email]]
> Sent: Friday, January 26, 2007 6:21 PM
> To: [hidden email]
> Subject: Re: Problems Searching an Index with Nutch
>
> Ok,
>
> I could not find any crawl_generate or crawl_parse Folder. Also I didn't
> find Catalina.out on my whole System?!?!
>
> One thing I won't understand is the fact that nutch should create my
> folder structure. If there is a fault in it, just like
> the missing folders or the 'db' folder which should normally be
> 'linkdb', how can I fix this. I didn't change anything at
> the structure by my own so it must have been created by nutch
> directly... Any idea how this could happen?
>
> Thanks for your time ;)
>
> --Erik
>
> Gal Nitzan schrieb:
>
>  
>> Well I guess that db is linkdb for ver 0.7 .
>>
>> Any way there is not much info maybe you can find more info in the
>> Catalina.out ...
>>
>> One more thing to look for just maybe it is the reason (long shut)...
>>    
> check
>  
>> each of your segment folders and verify that it contains all the 5 folders
>> i.e. content,crawl_generate,crawl_parse,parse_data,parse_text
>>
>> HTH
>>
>> Gal.
>>
>> -----Original Message-----
>> From: Erik Höschler [mailto:[hidden email]]
>> Sent: Friday, January 26, 2007 5:58 PM
>> To: [hidden email]
>> Subject: Re: Problems Searching an Index with Nutch
>>
>> Hi,
>>
>> I checked my FolderStructure and everything seems to be correct...
>>
>> :/opt/nutch/crawl.db# l
>> insgesamt 8
>> drwxr-xr-x   3 root root   53 2007-01-19 14:11 db
>> drwxr-xr-x   2 root root 4096 2007-01-19 14:18 index
>> drwxr-xr-x  12 root root 4096 2007-01-26 15:06 segments
>>
>> I'm not sure if I've ever had a linkdb Folder or did you mean the db
>> folder listed above?
>>
>> Greetings,
>> Erik
>>
>> Gal Nitzan schrieb:
>>  
>>    
>>> Hi,
>>>
>>> I'm not sure but it seems to me you are missing the linkdb and segments
>>> folder. It should be located on the same level as the index folder.
>>>
>>> HTH/
>>>
>>> Gal
>>>
>>> -----Original Message-----
>>> From: Erik Höschler [mailto:[hidden email]]
>>> Sent: Friday, January 26, 2007 5:04 PM
>>> To: [hidden email]
>>> Cc: Erik
>>> Subject: Problems Searching an Index with Nutch
>>>
>>> Hi,
>>>
>>> I'm running Nutch-0.7.2. I created an Index for my local Lan which
>>> consists of 45.000 Pages.
>>> I can inspect this Index with Luke an everything looks fine. When I try
>>> to start a search Query with Nutch
>>> I can see the following Exception in my JBOSS Logfile (at the End of the
>>> Log).
>>>
>>>
>>> //Here I'm redploying the Nutch.war Archive....
>>> 2007-01-26 15:55:06,611 INFO  [org.jboss.web.tomcat.tc5.TomcatDeployer]
>>> deploy, ctxPath=/nutch,
>>>
>>>    
>>>      
> warUrl=file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/
>  
>>  
>>    
>>> 2007-01-26 15:55:06,831 DEBUG [tomcat.localhost./nutch.Context] Starting
>>> tomcat.localhost./nutch.Context
>>> 2007-01-26 15:55:06,832 DEBUG [tomcat.localhost./nutch.Context]
>>> Configuring default Resources
>>> 2007-01-26 15:55:06,836 DEBUG [tomcat.localhost./nutch.Context]
>>> Processing standard container startup
>>> 2007-01-26 15:55:06,844 DEBUG [tomcat.localhost./nutch.Context] Setting
>>> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
>>> Application 2.3//EN'
>>> 2007-01-26 15:55:06,862 DEBUG [tomcat.localhost./nutch.Context] Setting
>>> deployment descriptor public ID to '-//Sun Microsystems, Inc.//DTD Web
>>> Application 2.3//EN'
>>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Posting
>>> standard context attributes
>>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]
>>> Configuring application event listeners
>>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Sending
>>> application start events
>>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context] Starting
>>> filters
>>> 2007-01-26 15:55:06,866 DEBUG [tomcat.localhost./nutch.Context]  
>>> Starting filter 'CommonHeadersFilter'
>>> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Starting
>>> completed //Archive successfully loaded...?!?!
>>> 2007-01-26 15:55:06,867 DEBUG [tomcat.localhost./nutch.Context] Checking
>>> for
>>>
>>>    
>>>      
> jboss.web:j2eeType=WebModule,name=//localhost/nutch,J2EEApplication=none,J2E
>  
>>  
>>    
>>> EServer=none
>>>
>>>
>>> //Here I startet a query in my Webbrowser...
>>> 2007-01-26 15:55:53,585 INFO  [STDOUT] 070126 155553 parsing
>>>
>>>    
>>>      
> file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
>  
>>  
>>    
>>> /classes/nutch-default.xml
>>> 2007-01-26 15:55:53,591 INFO  [STDOUT] 070126 155553 parsing
>>>
>>>    
>>>      
> file:/srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF
>  
>>  
>>    
>>> /classes/nutch-site.xml
>>> 2007-01-26 15:55:53,599 INFO  [STDOUT] 070126 155553 Plugins: looking
>>> in:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins
>>> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/clustering-carrot2
>>> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/creativecommons
>>> 2007-01-26 15:55:53,600 INFO  [STDOUT] 070126 155553 parsing:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/index-basic/plugin.xml
>>> 2007-01-26 15:55:53,607 INFO  [STDOUT] 070126 155553 impl:
>>> point=org.apache.nutch.indexer.IndexingFilter
>>> class=org.apache.nutch.indexer.basic.BasicIndexingFilter
>>> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/index-more
>>> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/language-identifier
>>> 2007-01-26 15:55:53,609 INFO  [STDOUT] 070126 155553 parsing:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/nutch-extensionpoints/plugin.xml
>>> 2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/ontology
>>> 2007-01-26 15:55:53,612 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/parse-ext
>>> 2007-01-26 15:55:53,613 INFO  [STDOUT] 070126 155553 parsing:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/parse-html/plugin.xml
>>> 2007-01-26 15:55:53,614 INFO  [STDOUT] 070126 155553 impl:
>>> point=org.apache.nutch.parse.Parser
>>> class=org.apache.nutch.parse.html.HtmlParser
>>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/parse-js
>>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/parse-msword
>>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/parse-pdf
>>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/parse-rss
>>> 2007-01-26 15:55:53,615 INFO  [STDOUT] 070126 155553 parsing:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/parse-text/plugin.xml
>>> 2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 impl:
>>> point=org.apache.nutch.parse.Parser
>>> class=org.apache.nutch.parse.text.TextParser
>>> 2007-01-26 15:55:53,617 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/protocol-file
>>> 2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/protocol-ftp
>>> 2007-01-26 15:55:53,618 INFO  [STDOUT] 070126 155553 parsing:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/protocol-http/plugin.xml
>>> 2007-01-26 15:55:53,619 INFO  [STDOUT] 070126 155553 impl:
>>> point=org.apache.nutch.protocol.Protocol
>>> class=org.apache.nutch.protocol.http.Http
>>> 2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/protocol-httpclient
>>> 2007-01-26 15:55:53,620 INFO  [STDOUT] 070126 155553 parsing:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/query-basic/plugin.xml
>>> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 impl:
>>> point=org.apache.nutch.searcher.QueryFilter
>>> class=org.apache.nutch.searcher.basic.BasicQueryFilter
>>> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/query-more
>>> 2007-01-26 15:55:53,622 INFO  [STDOUT] 070126 155553 parsing:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/query-site/plugin.xml
>>> 2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 impl:
>>> point=org.apache.nutch.searcher.QueryFilter
>>> class=org.apache.nutch.searcher.site.SiteQueryFilter
>>> 2007-01-26 15:55:53,624 INFO  [STDOUT] 070126 155553 parsing:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/query-url/plugin.xml
>>> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 impl:
>>> point=org.apache.nutch.searcher.QueryFilter
>>> class=org.apache.nutch.searcher.url.URLQueryFilter
>>> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 not including:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/urlfilter-prefix
>>> 2007-01-26 15:55:53,626 INFO  [STDOUT] 070126 155553 parsing:
>>>
>>>    
>>>      
> /srv/opt/jboss-3.2.6/server/ecs_cs/tmp/deploy/tmp31541nutch.war/WEB-INF/clas
>  
>>  
>>    
>>> ses/plugins/urlfilter-regex/plugin.xml
>>> 2007-01-26 15:55:53,628 INFO  [STDOUT] 070126 155553 impl:
>>> point=org.apache.nutch.net.URLFilter
>>> class=org.apache.nutch.net.RegexURLFilter
>>> 2007-01-26 15:55:53,639 INFO  [STDOUT] 070126 155553 10 creating new bean
>>> 2007-01-26 15:55:53,640 INFO  [STDOUT] 070126 155553 10 opening segment
>>> indexes in /srv/opt/nutch-0.7.2/crawl.db/segments
>>> 2007-01-26 15:55:53,652 ERROR [org.jboss.web.localhost.Engine]
>>> StandardWrapperValve[jsp]: Servlet.service() for servlet jsp threw
>>>    
>>>      
>> exception
>>  
>>    
>>> java.lang.ArrayIndexOutOfBoundsException
>>>
>>>
>>>
>>> In my Browser i got the following Error ...
>>>
>>>
>>>   HTTP Status 500 -
>>>
>>> ------------------------------------------------------------------------
>>>
>>> *type* Exception report
>>>
>>> *message*
>>>
>>> *description* _The server encountered an internal error () that
>>> prevented it from fulfilling this request._
>>>
>>> *exception*
>>>
>>> org.apache.jasper.JasperException
>>>
>>>
>>>    
>>>      
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
>  
>>  
>>    
>>> 72)
>>>
>>> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:292)
>>> org.apache.jasper.servlet.JspServlet.service(JspServlet.java:236)
>>> javax.servlet.http.HttpServlet.service(HttpServlet.java:810)
>>>
>>>
>>>    
>>>      
> org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.ja
>  
>>  
>>    
>>> va:75)
>>>
>>> *root cause*
>>>
>>> java.lang.ArrayIndexOutOfBoundsException
>>>
>>> *note* _The full stack trace of the root cause is available in the
>>> Apache Tomcat/5.0.28 logs._
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>>       Apache Tomcat/5.0.28
>>>
>>>
>>>
>>> I also tested this Search on a newly created Index ( a small one ) but
>>> got the same error. I Also tried to run Nutch-0.8.1 but still the same.
>>> Also I couldn't find any information about this error and now I don't
>>> know what to do. Maybe you have got a idea...
>>>
>>> Thanks in advance...
>>>
>>> Yours sincerely,
>>> Erik H.
>>>
>>>
>>>  
>>>    
>>>      
>>
>>  
>>    
>
>
>
>
>