NullPointerException

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException

Hasan Diwan
Right then.. compiled the svn version of nutch. Tried running the
crawl with it and this is the log:
server: 11:32pm % ./bin/nutch crawl ../SpectraSearch/urls -dir
../SpectraSearch/crawl -depth 2 -threads 20
060305 233255 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060305 233255 parsing file:/home/hdiwan/nutch/conf/nutch-default.xml
060305 233255 parsing file:/home/hdiwan/nutch/conf/crawl-tool.xml
060305 233255 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233255 parsing file:/home/hdiwan/nutch/conf/nutch-site.xml
060305 233255 parsing file:/home/hdiwan/nutch/conf/hadoop-site.xml
060305 233256 crawl started in: ../SpectraSearch/crawl
060305 233256 rootUrlDir = ../SpectraSearch/urls
060305 233256 threads = 20
060305 233256 depth = 2
060305 233256 Injector: starting
060305 233256 Injector: crawlDb: ../SpectraSearch/crawl/crawldb
060305 233256 Injector: urlDir: ../SpectraSearch/urls
060305 233256 Injector: Converting injected urls to crawl db entries.
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/nutch-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/crawl-tool.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/nutch-site.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/hadoop-site.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/nutch-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/crawl-tool.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/nutch-site.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/hadoop-site.xml
060305 233256 Running job: job_7n6bsm
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060305 233256 parsing
jar:file:/home/hdiwan/nutch/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060305 233256 parsing /tmp/hadoop/mapred/local/localRunner/job_7n6bsm.xml
060305 233256 parsing file:/home/hdiwan/nutch/conf/hadoop-site.xml
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/tmp/hadoop/mapred/local/localRunner/job_7n6bsm.xmlfinal:
hadoop-site.xml
        at org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
        at org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
060305 233257  map 0%  reduce 0%
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)
I need to sleep now, so I'll check back tomorrow. Thanks for all the help!
--
Cheers,
Hasan Diwan <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException

Howie Wang
In reply to this post by Hasan Diwan
I didn't see query-basic/query-more on your list of plugins included. This
is what
handles most queries usually. query-url will only handle parts of the
query that look like url:http://www.google.com, and query-site handles
site:www.google.com.  Nothing seems to be handling just regular
text in the content.

Is query-basic or query-more included in your nutch-default.xml?

I'm not sure why you don't see anything in Luke though.

Howie

>From: "Hasan Diwan" <[hidden email]>

>Mr Tang:
> > Crawling seems ok. Can you pls try org.apache.nutch.searcher.NutchBean
> > [your-query-string] in shell/cmd?
>
>server: 7:20pm % ./bin/nutch org.apache.nutch.searcher.NutchBean hasan
>060305 192042 10 parsing
>file:/home/hdiwan/nutch-0.7.1/conf/nutch-default.xml
>060305 192042 10 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-site.xml
>060305 192042 10 opening merged index in
>/home/hdiwan/SpectraSearch/crawl/index
>060305 192042 10 Plugins: looking in:
>/home/hdiwan/nutch-0.7.1/build/plugins
>060305 192042 10 parsing:
>/home/hdiwan/nutch-0.7.1/build/plugins/nutch-extensionpoints/plugin.xml
>060305 192042 10 not including:
>/home/hdiwan/nutch-0.7.1/build/plugins/protocol-file
>060305 192042 10 not including:
>/home/hdiwan/nutch-0.7.1/build/plugins/protocol-ftp
>060305 192042 10 not including:
>/home/hdiwan/nutch-0.7.1/build/plugins/protocol-http
>060305 192042 10 parsing:
>/home/hdiwan/nutch-0.7.1/build/plugins/protocol-httpclient/plugin.xml
>060305 192042 10 impl: point=org.apache.nutch.protocol.Protocol
>class=org.apache.nutch.protocol.httpclient.Http
>060305 192042 10 impl: point=org.apache.nutch.protocol.Protocol
>class=org.apache.nutch.protocol.httpclient.Http
>060305 192042 10 parsing:
>/home/hdiwan/nutch-0.7.1/build/plugins/parse-html/plugin.xml
>060305 192042 10 impl: point=org.apache.nutch.parse.Parser
>class=org.apache.nutc
>che.nutch.searcher.more.TypeQueryFilter
>060305 192043 10 impl: point=org.apache.nutch.searcher.QueryFilter
>class=org.apache.nutch.searcher.more.DateQueryFilter
>060305 192043 10 parsing:
>/home/hdiwan/nutch-0.7.1/build/plugins/query-site/plugin.xml
>060305 192043 10 impl: point=org.apache.nutch.searcher.QueryFilter
>class=org.apache.nutch.searcher.site.SiteQueryFilter
>060305 192043 10 parsing:
>/home/hdiwan/nutch-0.7.1/build/plugins/query-url/plugin.xml
>060305 192043 10 impl: point=org.apache.nutch.searcher.QueryFilter
>class=org.apache.nutch.searcher.url.URLQueryFilter
>060305 192043 10 not including:
>/home/hdiwan/nutch-0.7.1/build/plugins/urlfilter-regex
>060305 192043 10 not including:
>/home/hdiwan/nutch-0.7.1/build/plugins/urlfilter-prefix
>060305 192043 10 not including:
>/home/hdiwan/nutch-0.7.1/build/plugins/creativecommons
>060305 192043 10 not including:
>/home/hdiwan/nutch-0.7.1/build/plugins/language-identifier
>060305 192043 10 not including:
>/home/hdiwan/nutch-0.7.1/build/plugins/clustering-carrot2
>060305 192043 10 not including:
>/home/hdiwan/nutch-0.7.1/build/plugins/ontology
>Total hits: 0
>--
>Cheers,
>Hasan Diwan <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException

Hasan Diwan
On 06/03/06, Howie Wang <[hidden email]> wrote:
> Is query-basic or query-more included in your nutch-default.xml?

It is indeed included in my nutch-site.xml :-

 <property>
  <name>plugin.includes</name>
  <value>protocol-httpclient|urlfilter-regex|parse-(text|html|js)|index-more|query-(more|site|url)</value>
 </property>
Thanks for the help!
--
Cheers,
Hasan Diwan <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException

Howie Wang
Hi, Hasan,

Looking more carefully at the query-more plugin, it seems that it
only adds functionality for date queries and type queries. I think
you need to add query-basic to the list also to get it to search
the default content. Can you try adding query-basic and running:

bin/nutch search http

Howie

>On 06/03/06, Howie Wang <[hidden email]> wrote:
> > Is query-basic or query-more included in your nutch-default.xml?
>
>It is indeed included in my nutch-site.xml :-
>
>  <property>
>   <name>plugin.includes</name>
>  
><value>protocol-httpclient|urlfilter-regex|parse-(text|html|js)|index-more|query-(more|site|url)</value>
>  </property>
>Thanks for the help!
>--
>Cheers,
>Hasan Diwan <[hidden email]>


12