Re: help with hardware requirements

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: help with hardware requirements

Tomislav Poljak
Hi,
what would be a recommended hardware specification for a machine running
searcher web application with 15K users per day witch uses index of 100K
urls (crawling is done by other machine)? What is a good practice for
getting index from crawl machine to search machine (if using separate
machines for crawl and search)?

Thanks,
      Tomislav

Reply | Threaded
Open this post in threaded view
|

Re: help with hardware requirements

Otis Gospodnetic-2
Hi,

I'm curious about what Tomislav is asking about, too -- how do searchers know when to reopen the index?  That is, say you have a cluster of fetchers and every once in a while you end up with a newer version of an index (or indices), and say that you simply scp those indices to searchers, how do you signal the searcher webapps to go and reopen the new index?

In Solr-land, for example, this is done by issuing a "commit" command, which tells the Solr IndexSearcher to, among other things, reopen the index.  In pure Lucene-land, you check the index version via IndexReader.  How about in Nutchlandia?

Also, is scp-ing/rsyncing the index over to searcher boxes the way to go?  I didn't see this covered on the Wiki, but maybe I didn't search well enough? ;)

Thanks,
Otis

----- Original Message ----
From: Tomislav Poljak <[hidden email]>
To: [hidden email]
Sent: Friday, September 7, 2007 6:52:58 PM
Subject: Re: help with hardware requirements

Hi,
what would be a recommended hardware specification for a machine running
searcher web application with 15K users per day witch uses index of 100K
urls (crawling is done by other machine)? What is a good practice for
getting index from crawl machine to search machine (if using separate
machines for crawl and search)?

Thanks,
      Tomislav