error in using generate command

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

error in using generate command

beats
hi,

i m getting this weird error ( at least for me):

i m trying to crawl a some web pages..
with normal crawl command i m able to crawl, index -- no problem at all....

But when use each command seperately (inject, generate..)
i get error::

Generator: 0 records selected for fetching, exiting ...

the command i m using is:

bin/nutch inject test.crawl/crawldb urls/seed.txt

this succesfully insert the urls

then when i use this,

bin/nutch generate test.crawl/crawldb test.crawl/segments

then it give::

Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: monster.crawl/segments/20090718135110
Generator: filtering: true
Generator: jobtracker is 'local', generating exactly one partition.
Generator: reached
Generator: 0 records selected for fetching, exiting ...



While when use crawl command it gvs the correct result......


im using inject,generate command on fresh crawl dir ....


plz Help!!


with Regards

Tarun
Reply | Threaded
Open this post in threaded view
|

Re: error in using generate command

beats

hi..

i am not able to solve this problem

Any Ideas???
Reply | Threaded
Open this post in threaded view
|

Re: error in using generate command

Doğacan Güney-3
On Thu, Jul 23, 2009 at 10:58, Beats<[hidden email]> wrote:
>
>
> hi..
>
> i am not able to solve this problem
>

Crawl command uses crawl-urlfilter.txt while inject/generate/etc. commands
use other files (such as regex-urlfilter.txt). So you should check your filters.

> Any Ideas???
>
> --
> View this message in context: http://www.nabble.com/error-in-using-generate-command-tp24545715p24621067.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



--
Doğacan Güney
Reply | Threaded
Open this post in threaded view
|

Re: error in using generate command

Alex McLintock
In reply to this post by beats
Why does your example say both monster.crawl and test.crawl ?

Are you perhaps entering the command wrong or is this just an error in
the email?

Alex


2009/7/18 Beats <[hidden email]>:

>
> hi,
>
> i m getting this weird error ( at least for me):
>
> i m trying to crawl a some web pages..
> with normal crawl command i m able to crawl, index -- no problem at all....
>
> But when use each command seperately (inject, generate..)
> i get error::
>
> Generator: 0 records selected for fetching, exiting ...
>
> the command i m using is:
>
> bin/nutch inject test.crawl/crawldb urls/seed.txt
>
> this succesfully insert the urls
>
> then when i use this,
>
> bin/nutch generate test.crawl/crawldb test.crawl/segments
>
> then it give::
>
> Generator: Selecting best-scoring urls due for fetch.
> Generator: starting
> Generator: segment: monster.crawl/segments/20090718135110
> Generator: filtering: true
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: reached
> Generator: 0 records selected for fetching, exiting ...
>
>
>
> While when use crawl command it gvs the correct result......
>
>
> im using inject,generate command on fresh crawl dir ....
>
>
> plz Help!!
>
>
> with Regards
>
> Tarun
> --
> View this message in context: http://www.nabble.com/error-in-using-generate-command-tp24545711p24545711.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: error in using generate command

beats
Sorry for the error
it is just typing error.

thanx for replying
alexmc wrote
Why does your example say both monster.crawl and test.crawl ?

Are you perhaps entering the command wrong or is this just an error in
the email?

Alex


2009/7/18 Beats <tarun_agrawal_88@yahoo.com>:
>
> hi,
>
> i m getting this weird error ( at least for me):
>
> i m trying to crawl a some web pages..
> with normal crawl command i m able to crawl, index -- no problem at all....
>
> But when use each command seperately (inject, generate..)
> i get error::
>
> Generator: 0 records selected for fetching, exiting ...
>
> the command i m using is:
>
> bin/nutch inject test.crawl/crawldb urls/seed.txt
>
> this succesfully insert the urls
>
> then when i use this,
>
> bin/nutch generate test.crawl/crawldb test.crawl/segments
>
> then it give::
>
> Generator: Selecting best-scoring urls due for fetch.
> Generator: starting
> Generator: segment: monster.crawl/segments/20090718135110
> Generator: filtering: true
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: reached
> Generator: 0 records selected for fetching, exiting ...
>
>
>
> While when use crawl command it gvs the correct result......
>
>
> im using inject,generate command on fresh crawl dir ....
>
>
> plz Help!!
>
>
> with Regards
>
> Tarun
> --
> View this message in context: http://www.nabble.com/error-in-using-generate-command-tp24545711p24545711.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>