How to update search.dir with least interruption of service?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to update search.dir with least interruption of service?

yawl.62952928
When running a real world search engine, we will
have a script to do the
fetching all the time and
re-index periodicly. I am wondering how people
manage
their segments/indexes data: do you let your crawl
script write
directly to webapp's 'search.dir', or
let the crawl script write into one
place then copy
it over to 'search.dir'?

And seems I have to restart
J2EE server or at least
re-deploy the webapp to let search.jsp read the

new data. What is the best practice to have the
least interruption of service?


Thank you very much.

Yong


Reply | Threaded
Open this post in threaded view
|

RE: How to update search.dir with least interruption of service?

Howie Wang

I've always wondered about the first part, but for the
second part, you don't have to restart the app server.
You might have to create a page or script that gets rid
of the nutch bean. Something like:

    application.removeAttribute("nutchBean");
    NutchBean bean = NutchBean.get(application);

Then just call this page after loading the new index
to the search dir.

Howie


> Date: Wed, 27 Feb 2008 21:52:54 +0000
> From: [hidden email]
> To: [hidden email]
> Subject: How to update search.dir with least interruption of service?
>
> When running a real world search engine, we will
> have a script to do the
> fetching all the time and
> re-index periodicly. I am wondering how people
> manage
> their segments/indexes data: do you let your crawl
> script write
> directly to webapp's 'search.dir', or
> let the crawl script write into one
> place then copy
> it over to 'search.dir'?
>
> And seems I have to restart
> J2EE server or at least
> re-deploy the webapp to let search.jsp read the
>
> new data. What is the best practice to have the
> least interruption of service?
>
>
> Thank you very much.
>
> Yong
>
>

_________________________________________________________________
Connect and share in new ways with Windows Live.
http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008
Reply | Threaded
Open this post in threaded view
|

Re: How to update search.dir with least interruption of service?

Dennis Kubes-2
In reply to this post by yawl.62952928
If you are using the distributed search server, there is not the
capability to add/remove servers to the search-servers.txt file and
these changes will get picked up in a few seconds.  So the general
process would be:

1) do the reindexing in a different directory
2) startup a distributed search server pointing to the new directory
3) change the search-servers.txt file, remove old server and add new one
4) save search-servers.txt
5) shutdown old distributed search server.

The searcher.dir variable would remain constant and if you are running
the two search servers (new and old) on a single machine then simply use
different ports.

Dennis

[hidden email] wrote:

> When running a real world search engine, we will
> have a script to do the
> fetching all the time and
> re-index periodicly. I am wondering how people
> manage
> their segments/indexes data: do you let your crawl
> script write
> directly to webapp's 'search.dir', or
> let the crawl script write into one
> place then copy
> it over to 'search.dir'?
>
> And seems I have to restart
> J2EE server or at least
> re-deploy the webapp to let search.jsp read the
>
> new data. What is the best practice to have the
> least interruption of service?
>
>
> Thank you very much.
>
> Yong
>
>