Eclipse-Crawl Problem

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Eclipse-Crawl Problem

Volkan Ebil
I configured Eclipse following RunNutchInEclipse0.9 document.But when I give
the arguments to eclipse
And run the Project it gives the "No URLs to fetch - check your seed list
and URL filters".
I have changed the line in crawl-url filter
+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
With
+.
As it's suggested before.
But it didn't solve my problem.
Thanks for your help.
 
Volkan.

 

Reply | Threaded
Open this post in threaded view
|

Re: Eclipse-Crawl Problem

flocke
Hey Volkan,

did you specify any seed urls in an arbitrary file in the folder you pass to nutch
with the parameter -urls? This is necessary to give nutch some point(s)
to start off with the crawl.


Greets,
Christoph
 
Am Donnerstag, den 17.01.2008, 12:27 +0200 schrieb Volkan Ebil:

> I configured Eclipse following RunNutchInEclipse0.9 document.But when I give
> the arguments to eclipse
> And run the Project it gives the "No URLs to fetch - check your seed list
> and URL filters".
> I have changed the line in crawl-url filter
> +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
> With
> +.
> As it's suggested before.
> But it didn't solve my problem.
> Thanks for your help.
>  
> Volkan.
>
>  
>

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Eclipse-Crawl Problem

Volkan Ebil
Yes i know how to start crawl process.I have created the url txt file in
specifed folder.The problem occures in eclipse enviroment.
Is any body know something about my problem?
Thanks.

-----Original Message-----
From: Christoph M. Pflügler
[mailto:[hidden email]]
Sent: Thursday, January 17, 2008 12:44 PM
To: [hidden email]
Subject: Re: Eclipse-Crawl Problem

Hey Volkan,

did you specify any seed urls in an arbitrary file in the folder you pass to
nutch
with the parameter -urls? This is necessary to give nutch some point(s)
to start off with the crawl.


Greets,
Christoph
 
Am Donnerstag, den 17.01.2008, 12:27 +0200 schrieb Volkan Ebil:
> I configured Eclipse following RunNutchInEclipse0.9 document.But when I
give

> the arguments to eclipse
> And run the Project it gives the "No URLs to fetch - check your seed list
> and URL filters".
> I have changed the line in crawl-url filter
> +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
> With
> +.
> As it's suggested before.
> But it didn't solve my problem.
> Thanks for your help.
>  
> Volkan.
>
>  
>

Reply | Threaded
Open this post in threaded view
|

RE: Eclipse-Crawl Problem

flocke
.. so it is probably a problem within the crawl-urlfilter.txt

Does the problem occur without eclipse, too? (to be sure, that your
eclipse configuration is correct)

I have nutch running in eclipse without problems.

Maybe you should post your complete crawl-urlfilter.txt

Chris

Am Donnerstag, den 17.01.2008, 14:20 +0200 schrieb Volkan Ebil:

> Yes i know how to start crawl process.I have created the url txt file in
> specifed folder.The problem occures in eclipse enviroment.
> Is any body know something about my problem?
> Thanks.
>
> -----Original Message-----
> From: Christoph M. Pflügler
> [mailto:[hidden email]]
> Sent: Thursday, January 17, 2008 12:44 PM
> To: [hidden email]
> Subject: Re: Eclipse-Crawl Problem
>
> Hey Volkan,
>
> did you specify any seed urls in an arbitrary file in the folder you pass to
> nutch
> with the parameter -urls? This is necessary to give nutch some point(s)
> to start off with the crawl.
>
>
> Greets,
> Christoph
>  
> Am Donnerstag, den 17.01.2008, 12:27 +0200 schrieb Volkan Ebil:
> > I configured Eclipse following RunNutchInEclipse0.9 document.But when I
> give
> > the arguments to eclipse
> > And run the Project it gives the "No URLs to fetch - check your seed list
> > and URL filters".
> > I have changed the line in crawl-url filter
> > +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
> > With
> > +.
> > As it's suggested before.
> > But it didn't solve my problem.
> > Thanks for your help.
> >  
> > Volkan.
> >
> >  
> >

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Eclipse-Crawl Problem

flocke
In reply to this post by Volkan Ebil
I just saw that you only changed the one line in urlfilter.txt you
described.

So I suppose it still contains the "-." line. If so, try it without that
line, this might solve your problem.

Chris

Am Donnerstag, den 17.01.2008, 14:20 +0200 schrieb Volkan Ebil:

> Yes i know how to start crawl process.I have created the url txt file in
> specifed folder.The problem occures in eclipse enviroment.
> Is any body know something about my problem?
> Thanks.
>
> -----Original Message-----
> From: Christoph M. Pflügler
> [mailto:[hidden email]]
> Sent: Thursday, January 17, 2008 12:44 PM
> To: [hidden email]
> Subject: Re: Eclipse-Crawl Problem
>
> Hey Volkan,
>
> did you specify any seed urls in an arbitrary file in the folder you pass to
> nutch
> with the parameter -urls? This is necessary to give nutch some point(s)
> to start off with the crawl.
>
>
> Greets,
> Christoph
>  
> Am Donnerstag, den 17.01.2008, 12:27 +0200 schrieb Volkan Ebil:
> > I configured Eclipse following RunNutchInEclipse0.9 document.But when I
> give
> > the arguments to eclipse
> > And run the Project it gives the "No URLs to fetch - check your seed list
> > and URL filters".
> > I have changed the line in crawl-url filter
> > +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
> > With
> > +.
> > As it's suggested before.
> > But it didn't solve my problem.
> > Thanks for your help.
> >  
> > Volkan.
> >
> >  
> >

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Eclipse-Crawl Problem

Volkan Ebil
Ok I'll post it but there is no problem without eclipse.
Thanks for your interest.

-----Original Message-----
From: Christoph M. Pflügler
[mailto:[hidden email]]
Sent: Thursday, January 17, 2008 3:04 PM
To: [hidden email]
Subject: RE: Eclipse-Crawl Problem

I just saw that you only changed the one line in urlfilter.txt you
described.

So I suppose it still contains the "-." line. If so, try it without that
line, this might solve your problem.

Chris

Am Donnerstag, den 17.01.2008, 14:20 +0200 schrieb Volkan Ebil:

> Yes i know how to start crawl process.I have created the url txt file in
> specifed folder.The problem occures in eclipse enviroment.
> Is any body know something about my problem?
> Thanks.
>
> -----Original Message-----
> From: Christoph M. Pflügler
> [mailto:[hidden email]]
> Sent: Thursday, January 17, 2008 12:44 PM
> To: [hidden email]
> Subject: Re: Eclipse-Crawl Problem
>
> Hey Volkan,
>
> did you specify any seed urls in an arbitrary file in the folder you pass
to

> nutch
> with the parameter -urls? This is necessary to give nutch some point(s)
> to start off with the crawl.
>
>
> Greets,
> Christoph
>  
> Am Donnerstag, den 17.01.2008, 12:27 +0200 schrieb Volkan Ebil:
> > I configured Eclipse following RunNutchInEclipse0.9 document.But when I
> give
> > the arguments to eclipse
> > And run the Project it gives the "No URLs to fetch - check your seed
list

> > and URL filters".
> > I have changed the line in crawl-url filter
> > +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
> > With
> > +.
> > As it's suggested before.
> > But it didn't solve my problem.
> > Thanks for your help.
> >  
> > Volkan.
> >
> >  
> >

crawl-urlfilter.txt (949 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Eclipse-Crawl Problem

flocke
Hmm, a crawl with your crawlfilter is working perfectly inside eclipse.

You basically accept everything.

This is what I had in my urls.txt:

http://www.sabah.com

Looks like I can't help you, sorry.

Chris

Am Donnerstag, den 17.01.2008, 15:12 +0200 schrieb Volkan Ebil:

> Ok I'll post it but there is no problem without eclipse.
> Thanks for your interest.
>
> -----Original Message-----
> From: Christoph M. Pflügler
> [mailto:[hidden email]]
> Sent: Thursday, January 17, 2008 3:04 PM
> To: [hidden email]
> Subject: RE: Eclipse-Crawl Problem
>
> I just saw that you only changed the one line in urlfilter.txt you
> described.
>
> So I suppose it still contains the "-." line. If so, try it without that
> line, this might solve your problem.
>
> Chris
>
> Am Donnerstag, den 17.01.2008, 14:20 +0200 schrieb Volkan Ebil:
> > Yes i know how to start crawl process.I have created the url txt file in
> > specifed folder.The problem occures in eclipse enviroment.
> > Is any body know something about my problem?
> > Thanks.
> >
> > -----Original Message-----
> > From: Christoph M. Pflügler
> > [mailto:[hidden email]]
> > Sent: Thursday, January 17, 2008 12:44 PM
> > To: [hidden email]
> > Subject: Re: Eclipse-Crawl Problem
> >
> > Hey Volkan,
> >
> > did you specify any seed urls in an arbitrary file in the folder you pass
> to
> > nutch
> > with the parameter -urls? This is necessary to give nutch some point(s)
> > to start off with the crawl.
> >
> >
> > Greets,
> > Christoph
> >  
> > Am Donnerstag, den 17.01.2008, 12:27 +0200 schrieb Volkan Ebil:
> > > I configured Eclipse following RunNutchInEclipse0.9 document.But when I
> > give
> > > the arguments to eclipse
> > > And run the Project it gives the "No URLs to fetch - check your seed
> list
> > > and URL filters".
> > > I have changed the line in crawl-url filter
> > > +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
> > > With
> > > +.
> > > As it's suggested before.
> > > But it didn't solve my problem.
> > > Thanks for your help.
> > >  
> > > Volkan.
> > >
> > >  
> > >

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Eclipse-Crawl Problem

kishore.krishna2
In reply to this post by Volkan Ebil
hi
put
+.
at the top of the txt file(url-filter)
thanx
kishore

________________________________

From: Volkan Ebil [mailto:[hidden email]]
Sent: Thu 1/17/2008 3:57 PM
To: [hidden email]
Subject: Eclipse-Crawl Problem



I configured Eclipse following RunNutchInEclipse0.9 document.But when I give
the arguments to eclipse
And run the Project it gives the "No URLs to fetch - check your seed list
and URL filters".
I have changed the line in crawl-url filter
+^
As it's suggested before.
But it didn't solve my problem.
Thanks for your help.

Volkan.



<https://blr-ec-fe2.wipro.com/.>


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Reply | Threaded
Open this post in threaded view
|

Re: Eclipse-Crawl Problem

Mark J. Hoy
In reply to this post by Volkan Ebil
Volkan -

You need to remove the comment (#) from the line:

#+^http://([a-z0-9]*\.)*sabah.com/

to allow it to crawl on the sabah.com domain. You can keep the -. line at the bottom as nutch will process the restrictions in the order they are found.




Volkan Ebil wrote:

> Ok I'll post it but there is no problem without eclipse.
> Thanks for your interest.
>
> -----Original Message-----
> From: Christoph M. Pflügler
> [mailto:[hidden email]]
> Sent: Thursday, January 17, 2008 3:04 PM
> To: [hidden email]
> Subject: RE: Eclipse-Crawl Problem
>
> I just saw that you only changed the one line in urlfilter.txt you
> described.
>
> So I suppose it still contains the "-." line. If so, try it without that
> line, this might solve your problem.
>
> Chris
>
> Am Donnerstag, den 17.01.2008, 14:20 +0200 schrieb Volkan Ebil:
>  
>> Yes i know how to start crawl process.I have created the url txt file in
>> specifed folder.The problem occures in eclipse enviroment.
>> Is any body know something about my problem?
>> Thanks.
>>
>> -----Original Message-----
>> From: Christoph M. Pflügler
>> [mailto:[hidden email]]
>> Sent: Thursday, January 17, 2008 12:44 PM
>> To: [hidden email]
>> Subject: Re: Eclipse-Crawl Problem
>>
>> Hey Volkan,
>>
>> did you specify any seed urls in an arbitrary file in the folder you pass
>>    
> to
>  
>> nutch
>> with the parameter -urls? This is necessary to give nutch some point(s)
>> to start off with the crawl.
>>
>>
>> Greets,
>> Christoph
>>  
>> Am Donnerstag, den 17.01.2008, 12:27 +0200 schrieb Volkan Ebil:
>>    
>>> I configured Eclipse following RunNutchInEclipse0.9 document.But when I
>>>      
>> give
>>    
>>> the arguments to eclipse
>>> And run the Project it gives the "No URLs to fetch - check your seed
>>>      
> list
>  
>>> and URL filters".
>>> I have changed the line in crawl-url filter
>>> +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
>>> With
>>> +.
>>> As it's suggested before.
>>> But it didn't solve my problem.
>>> Thanks for your help.
>>>  
>>> Volkan.
>>>
>>>  
>>>
>>>