Nutch on FC4

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Nutch on FC4

Sam Robb
Hi,

  I'm experimenting with running Nutch under Fedora Core 4.
I'd like to run Nutch using Tomcat 5 started as a service
via /etc/init.d - the default setup under FC4.  However, the
last part of the Nutch tutorial states:

  "The webapp finds its indexes in ./segments, relative to
   where you start Tomcat, so, if you've done intranet crawling,
   connect to your crawl directory, or, if you've done whole-web
   crawling, don't change directories, and give the command:
   ~/local/tomcat/bin/catalina.sh start"

  I'm new to Tomcat, Nutch, etc. and I'm wondering what I should
do to set things up so that Nutch can find ./segments in the
expected location?

  Thanks,

-Samrobb
Reply | Threaded
Open this post in threaded view
|

Re: Nutch on FC4

Andy Lee-2
On Oct 17, 2005, at 1:56 PM, Robb, Sam wrote:
>   "The webapp finds its indexes in ./segments, relative to
>    where you start Tomcat, so, if you've done intranet crawling,
>    connect to your crawl directory, or, if you've done whole-web
>    crawling, don't change directories, and give the command:
>    ~/local/tomcat/bin/catalina.sh start"
>
>   I'm new to Tomcat, Nutch, etc. and I'm wondering what I should
> do to set things up so that Nutch can find ./segments in the
> expected location?

You can declare the location of the segments by adding a setting  
nutch-site.xml.

Settings in nutch-site.xml override settings in nutch-default.xml.  
Both files are looked for on your classpath.  It might be clearer if  
you see the source where the files are loaded -- I think it's in  
NutchConf.java.

See the searcher.dir property in nutch-defaults.xml and set it to  
what you want in nutch-site.xml.

--Andy

Reply | Threaded
Open this post in threaded view
|

Re: Nutch on FC4

cf-auto
In reply to this post by Sam Robb
Hi

add
<property>
   <name>searcher.dir</name>
   <value>/path/to/your/segments/dir/</value>
</property>

to WEB-INF/classes/nutch-site.xml

I'm using an absolute path in <value>.
The path should point to the directory that contains the
"segments"-directory.



regards
c

Am Montag, den 17.10.2005, 13:56 -0400 schrieb Robb, Sam:

> Hi,
>
>   I'm experimenting with running Nutch under Fedora Core 4.
> I'd like to run Nutch using Tomcat 5 started as a service
> via /etc/init.d - the default setup under FC4.  However, the
> last part of the Nutch tutorial states:
>
>   "The webapp finds its indexes in ./segments, relative to
>    where you start Tomcat, so, if you've done intranet crawling,
>    connect to your crawl directory, or, if you've done whole-web
>    crawling, don't change directories, and give the command:
>    ~/local/tomcat/bin/catalina.sh start"
>
>   I'm new to Tomcat, Nutch, etc. and I'm wondering what I should
> do to set things up so that Nutch can find ./segments in the
> expected location?
>
>   Thanks,
>
> -Samrobb

Reply | Threaded
Open this post in threaded view
|

RE: Nutch on FC4

Sam Robb
In reply to this post by Sam Robb
> Hi
>
> add
> <property>
>    <name>searcher.dir</name>
>    <value>/path/to/your/segments/dir/</value>
> </property>
>
> to WEB-INF/classes/nutch-site.xml
>
> I'm using an absolute path in <value>.
> The path should point to the directory that contains the
> "segments"-directory.

Thanks both to you and Andy Lee for pointing out the solution,
and exposing me to the wider world of nutch configuration
parameters :-)

-Samrobb