urls list crawling

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

urls list crawling

Abdelhakim Diab
I want to crawl a list of sites , but when I put the urls in the urls.txt
file the crawler fetches the first url just.
and no fetching for the other urls
how can I solve this problem .
the urls :
http://lucene.apache.org/nutch/
http://www.spacetoon.com 

Reply | Threaded
Open this post in threaded view
|

Re: urls list crawling

Nuther
Hi,Abdelhakim.

What is the version of nutch you are using?
You wrote 26 июня 2006 г., 17:04:12:

> I want to crawl a list of sites , but when I put the urls in the urls.txt
> file the crawler fetches the first url just.
> and no fetching for the other urls
> how can I solve this problem .
> the urls :
> http://lucene.apache.org/nutch/
> http://www.spacetoon.com 




--
Regards,
 Dima                          mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: urls list crawling

Abdelhakim Diab
thanks for your replay .
I solved the problem .
I am useing nutch 7.0.2
the problem was in the filter.
thanks very much.

----- Original Message -----
From: "Dima Mazmanov" <[hidden email]>
To: "Abdelhakim Diab" <[hidden email]>
Sent: Monday, June 26, 2006 4:24 PM
Subject: Re: urls list crawling


Hi,Abdelhakim.

What is the version of nutch you are using?
You wrote 26 июня 2006 г., 17:04:12:

> I want to crawl a list of sites , but when I put the urls in the urls.txt
> file the crawler fetches the first url just.
> and no fetching for the other urls
> how can I solve this problem .
> the urls :
> http://lucene.apache.org/nutch/
> http://www.spacetoon.com




--
Regards,
 Dima                          mailto:[hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: urls list crawling

Tonal Web Design - Stijn
I'm trying to get nutch working for my large web site, but I can't find
answers to basic questions after looking all over the nutch site and
searching google.

1) Why doesn't 0.7.2 allow me to search by "title:", I have 15 different
fields showing in Luke but I can only search two of them? Url: and site:, is
that it?

2) How would I add an additional field like "author:" that can be searched
by.

3) Is there an search in "anchor:" ability?

4) Can't you do wildcard searches? Like "d?g" or "t*est" etc.

5) Why does it seem that nutch doesn't support Lucene's full feature set of
query types etc.?

6) I'm using this mostly for site search, I have access to the database,
would it just be better
to use Lucene and index my database instead of using nutch? Is there
application that's better
suited for indexing a database that uses Lucene, and preferably outputs
opensearch XML?


Also, what is your guys IRC channel you're using?




-----Original Message-----
From: Abdelhakim Diab [mailto:[hidden email]]
Sent: Tuesday, June 27, 2006 2:53 AM
To: [hidden email]; Dima Mazmanov
Subject: Re: urls list crawling

thanks for your replay .
I solved the problem .
I am useing nutch 7.0.2
the problem was in the filter.
thanks very much.

----- Original Message -----
From: "Dima Mazmanov" <[hidden email]>
To: "Abdelhakim Diab" <[hidden email]>
Sent: Monday, June 26, 2006 4:24 PM
Subject: Re: urls list crawling


Hi,Abdelhakim.

What is the version of nutch you are using?
You wrote 26 июня 2006 г., 17:04:12:

> I want to crawl a list of sites , but when I put the urls in the urls.txt
> file the crawler fetches the first url just.
> and no fetching for the other urls
> how can I solve this problem .
> the urls :
> http://lucene.apache.org/nutch/
> http://www.spacetoon.com




--
Regards,
 Dima                          mailto:[hidden email]





______________________________________
Tonal web design and hosting
http://tonalweb.com
eCommerce development & marketing