Running nutch on a non-port 80 site

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Running nutch on a non-port 80 site

Deepa Devanathan-2
Hi,

I have a setup where a non-Apache server is the one serving up content on a
port other than 80 along with
a Tomcat for jsp content. I have installed nutch and ran the crawl program.

The indexes are not getting created properly - I was unable to see the URLs
of the pages being index in the log of the crawl program.

Any ideas as to what the prob with the indexes could be ?

Any help is greatly appreciated..
thanks in advance,
Deepa
Reply | Threaded
Open this post in threaded view
|

Re: Running nutch on a non-port 80 site

Alexander E Genaud
Deepa,

Is nutch serving search results on another port or is your content
(the stuff crawled) on a port other than port 80? If the content is
globally accessible, the port number should not matter. Nutch will
crawl and index http://yourhost:1234/ as any other web site. Are you
using HTTP, FILE, or some other protocol to crawl?

Alex
--
CCC7 D19D D107 F079 2F3D BF97 8443 DB5A 6DB8 9CE1
--

From: "Deepa Devanathan" <[hidden email]>
To: [hidden email]
Date: Fri, 28 Apr 2006 13:51:54 +0530
Subject: Running nutch on a non-port 80 site
Hi,

I have a setup where a non-Apache server is the one serving up content on a
port other than 80 along with
a Tomcat for jsp content. I have installed nutch and ran the crawl program.

The indexes are not getting created properly - I was unable to see the URLs
of the pages being index in the log of the crawl program.

Any ideas as to what the prob with the indexes could be ?

Any help is greatly appreciated..
thanks in advance,
Deepa