Caching the search results

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Caching the search results

ocramp
Hi,

 Anybody knows how can I set Nutch to cache the results of the searches?
 I've heard about this feature but I am not finding the information....

Thanks,
Marco
Reply | Threaded
Open this post in threaded view
|

Re: Caching the search results

Andrzej Białecki-2
Marco Vanossi wrote:
> Hi,
>
> Anybody knows how can I set Nutch to cache the results of the searches?
> I've heard about this feature but I am not finding the information....

Trivial web-level caching is easy to implement - just download osCache
and modify your web application settings according to its documentation.

Smart caching on the level of indexes is more difficult to implement,
and Nutch doesn't include anything like that. You may find this paper of
interest:

    http://www2005.org/cdrom/docs/p257.pdf

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply | Threaded
Open this post in threaded view
|

RE: Caching the search results

Chirag Chaman
Marco,

We use a search caching system at Filangy -- uses lucene to save the Search
string, count, date and top 20 IDs of the pages. So all you have to do is
search for those IDs.

Yes, it still involves a search, but we have a distributed system with the
ID as the hash key for specifying on which server to find the details of the
page making the parallel search more efficient. This search is about 60-75%
faster than a regular search.

You should be able to put a similar implementation together. I'm willing to
release this code to the open domain, PROVIDED, you or anyone else whose
interested changes it to make it generic and release as open-source to
other's in the nutch community.

CC-
--------------------------------------------
Chirag Chaman | Filangy, Inc.

-----Original Message-----
From: Andrzej Bialecki [mailto:[hidden email]]
Sent: Tuesday, September 05, 2006 8:20 AM
To: [hidden email]
Subject: Re: Caching the search results

Marco Vanossi wrote:
> Hi,
>
> Anybody knows how can I set Nutch to cache the results of the searches?
> I've heard about this feature but I am not finding the information....

Trivial web-level caching is easy to implement - just download osCache and
modify your web application settings according to its documentation.

Smart caching on the level of indexes is more difficult to implement, and
Nutch doesn't include anything like that. You may find this paper of
interest:

    http://www2005.org/cdrom/docs/p257.pdf

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||  \|
||  |  Embedded Unix, System Integration http://www.sigram.com  Contact:
info at sigram dot com




Reply | Threaded
Open this post in threaded view
|

Re: [Nutch-general] Caching the search results

Otis Gospodnetic-2-2
In reply to this post by ocramp
You may want to consider using memcached - http://www.danga.com/memcached/ - it's super simple and super stable.  I use it over at Simpy.com and the memcached daemon there has been up for months without showing any signs of trouble.

Otis

----- Original Message ----
From: Marco Vanossi <[hidden email]>
To: [hidden email]
Sent: Tuesday, September 5, 2006 7:53:35 AM
Subject: [Nutch-general] Caching the search results

Hi,

 Anybody knows how can I set Nutch to cache the results of the searches?
 I've heard about this feature but I am not finding the information....

Thanks,
Marco

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/nutch-general



Reply | Threaded
Open this post in threaded view
|

Re: [Nutch-general] Caching the search results

kkrugler
>You may want to consider using memcached -
>http://www.danga.com/memcached/ - it's super simple and super
>stable.  I use it over at Simpy.com and the memcached daemon there
>has been up for months without showing any signs of trouble.

We've had good luck with ehcache  (http://ehcache.sourceforge.net).

-- Ken


>----- Original Message ----
>From: Marco Vanossi <[hidden email]>
>To: [hidden email]
>Sent: Tuesday, September 5, 2006 7:53:35 AM
>Subject: [Nutch-general] Caching the search results
>
>Hi,
>
>  Anybody knows how can I set Nutch to cache the results of the searches?
>  I've heard about this feature but I am not finding the information....
>
>Thanks,
>Marco

--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"