how to make sure a particular query is ALWAYS cached

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

how to make sure a particular query is ALWAYS cached

britske
I want a couple of costly queries to be cached at all times in the queryResultCache. (unless I have a new searcher of course)

As for as I know the only parameters to be supplied to the LRU-implementation of the queryResultCache are size-related, which doens't give me this guarentee.

what would be my best bet to implement this functionality with the least impact?
1. use User/Generic-cache. This would result in seperate coding-path in application which I would like to avoid.
2. exend LRU-cache, and extend request-handler so that a query can be extended with a parameter indicating that it should be cached at all times. However, this seems like a lot of cluttering-up these interfaces, for a relatively small change.
3. another option..

best regards,
Geert-Jan
Reply | Threaded
Open this post in threaded view
|

Re: how to make sure a particular query stays cached (and is not overwritten)

britske
the title of my original post was misguided.

// Geert-Jan

Britske wrote
I want a couple of costly queries to be cached at all times in the queryResultCache. (unless I have a new searcher of course)

As for as I know the only parameters to be supplied to the LRU-implementation of the queryResultCache are size-related, which doens't give me this guarentee.

what would be my best bet to implement this functionality with the least impact?
1. use User/Generic-cache. This would result in seperate coding-path in application which I would like to avoid.
2. exend LRU-cache, and extend request-handler so that a query can be extended with a parameter indicating that it should be cached at all times. However, this seems like a lot of cluttering-up these interfaces, for a relatively small change.
3. another option..

best regards,
Geert-Jan
Reply | Threaded
Open this post in threaded view
|

Re: how to make sure a particular query is ALWAYS cached

hossman
In reply to this post by britske

: I want a couple of costly queries to be cached at all times in the
: queryResultCache. (unless I have a new searcher of course)

first off: you can ensure that certain queries are in the cache, even if
there is a newSearcher, just configure a newSearcher Event Listener that
forcibly warms the queries you care about.

(this is particularly handy to ensure FieldCache gets populated before any
user queries are processed)

Second: if i understand correctly, you want a way to put an object in the
cache, and garuntee that it's always in the cache, even if other objects
are more frequetnly used or more recently used?

that's kind of a weird use case ... can you elaborate a little more on
what exactly your end goal is?


the most straightforward approach i can think of would be a new cache
implementation that "permenantly" stores the first N items you put in it.  
that in combination with the newSearcher warming i described above should
work.

: 1. use User/Generic-cache. This would result in seperate coding-path in
: application which I would like to avoid.
: 2. exend LRU-cache, and extend request-handler so that a query can be
: extended with a parameter indicating that it should be cached at all times.
: However, this seems like a lot of cluttering-up these interfaces, for a
: relatively small change.

#1 wouldn't really accomplish what you want without #2 as well.




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: how to make sure a particular query is ALWAYS cached

britske
hossman wrote
: I want a couple of costly queries to be cached at all times in the
: queryResultCache. (unless I have a new searcher of course)

first off: you can ensure that certain queries are in the cache, even if
there is a newSearcher, just configure a newSearcher Event Listener that
forcibly warms the queries you care about.

(this is particularly handy to ensure FieldCache gets populated before any
user queries are processed)

Second: if i understand correctly, you want a way to put an object in the
cache, and garuntee that it's always in the cache, even if other objects
are more frequetnly used or more recently used?

that's kind of a weird use case ... can you elaborate a little more on
what exactly your end goal is?
Sure.  Actually i got the idea of another thread posted by Thomas to which you gave a reply a few days ago:  http://www.nabble.com/Result-grouping-options-tf4522284.html#a12900630.
I quote the relevant bits below, although I think you remember:

hossman wrote
: Is it possible to use faceting to not only get the facet count but also the
: top-n documents for every facet
: directly? If not, how hard would it be to implement this as an extension?

not hard ... a custom request handler could subclass
StandardRequestHandler, call super.handleRequest, and then pull the field
faceting info out of the response object, and fetch a DocList for each of
the top N field constraints.
I have about a dozen queries that I want to have permanently cached, each corresponding to a particular navigation page. Each of these pages has up to about 10 top-N lists which are populated as discussed above. These lists are pretty static (updated once a day, together with the index).

The above would enable me to populate all the lists on a single page in 1 pass. Correct?
Although I haven't tried yet, I can't imagine that this request returns in sub-zero seconds, which is what I want (having a index of about 1M docs with 6000 fields/ doc and about 10 complex facetqueries / request).

The navigation-pages are pretty important for, eh well navigation ;-) and although I can rely on frequent access of these pages most of the time, it is not guarenteed (so neither is the caching)

hossman wrote
the most straightforward approach i can think of would be a new cache
implementation that "permenantly" stores the first N items you put in it.  
that in combination with the newSearcher warming i described above should
work.

: 1. use User/Generic-cache. This would result in seperate coding-path in
: application which I would like to avoid.
: 2. exend LRU-cache, and extend request-handler so that a query can be
: extended with a parameter indicating that it should be cached at all times.
: However, this seems like a lot of cluttering-up these interfaces, for a
: relatively small change.

#1 wouldn't really accomplish what you want without #2 as well.



-Hoss
regarding #1.
Wouldn't making a user-cache for the sole-purpose of storing these queries be enough? I could then reference this user-cache by name, and extract the correct queryresult. (at least that's how I read the documentation, I have no previous experience with the user-cache mechanism).  In that case I don't need #2 right? Or is this for another reason not a good way to handle things?

//Geert-Jan
Reply | Threaded
Open this post in threaded view
|

RE: how to make sure a particular query is ALWAYS cached

Lance Norskog-2
You could make these filter queries. Filters are a separate cache and as
long as you have more cache than queries they will remain pinned in RAM.
Your code has to remember these special queries in special-case code, and
create dummy query strings to fetch the filter query.  "field:[* TO *]" will
do nicely.

Cheers,

Lance Norskog

-----Original Message-----
From: Britske [mailto:[hidden email]]
Sent: Thursday, October 04, 2007 1:38 PM
To: [hidden email]
Subject: Re: how to make sure a particular query is ALWAYS cached



hossman wrote:

>
>
> : I want a couple of costly queries to be cached at all times in the
> : queryResultCache. (unless I have a new searcher of course)
>
> first off: you can ensure that certain queries are in the cache, even
> if there is a newSearcher, just configure a newSearcher Event Listener
> that forcibly warms the queries you care about.
>
> (this is particularly handy to ensure FieldCache gets populated before
> any user queries are processed)
>
> Second: if i understand correctly, you want a way to put an object in
> the cache, and garuntee that it's always in the cache, even if other
> objects are more frequetnly used or more recently used?
>
> that's kind of a weird use case ... can you elaborate a little more on
> what exactly your end goal is?
>
>

Sure.  Actually i got the idea of another thread posted by Thomas to which
you gave a reply a few days ago:
http://www.nabble.com/Result-grouping-options-tf4522284.html#a12900630.
I quote the relevant bits below, although I think you remember:


hossman wrote:

>
> : Is it possible to use faceting to not only get the facet count but
> also the
> : top-n documents for every facet
> : directly? If not, how hard would it be to implement this as an
> extension?
>
> not hard ... a custom request handler could subclass
> StandardRequestHandler, call super.handleRequest, and then pull the
> field faceting info out of the response object, and fetch a DocList
> for each of the top N field constraints.
>

I have about a dozen queries that I want to have permanently cached, each
corresponding to a particular navigation page. Each of these pages has up to
about 10 top-N lists which are populated as discussed above. These lists are
pretty static (updated once a day, together with the index).

The above would enable me to populate all the lists on a single page in 1
pass. Correct?
Although I haven't tried yet, I can't imagine that this request returns in
sub-zero seconds, which is what I want (having a index of about 1M docs with
6000 fields/ doc and about 10 complex facetqueries / request).

The navigation-pages are pretty important for, eh well navigation ;-) and
although I can rely on frequent access of these pages most of the time, it
is not guarenteed (so neither is the caching)


hossman wrote:

>
> the most straightforward approach i can think of would be a new cache
> implementation that "permenantly" stores the first N items you put in it.
> that in combination with the newSearcher warming i described above
> should work.
>
> : 1. use User/Generic-cache. This would result in seperate coding-path
> in
> : application which I would like to avoid.
> : 2. exend LRU-cache, and extend request-handler so that a query can
> be
> : extended with a parameter indicating that it should be cached at all
> times.
> : However, this seems like a lot of cluttering-up these interfaces,
> for a
> : relatively small change.
>
> #1 wouldn't really accomplish what you want without #2 as well.
>
>
>
> -Hoss
>
>
>

regarding #1.
Wouldn't making a user-cache for the sole-purpose of storing these queries
be enough? I could then reference this user-cache by name, and extract the
correct queryresult. (at least that's how I read the documentation, I have
no previous experience with the user-cache mechanism).  In that case I don't
need #2 right? Or is this for another reason not a good way to handle
things?

//Geert-Jan

--
View this message in context:
http://www.nabble.com/how-to-make-sure-a-particular-query-is-ALWAYS-cached-t
f4566711.html#a13048285
Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

RE: how to make sure a particular query is ALWAYS cached

britske
I need the documents in order, so FilterCache is no use. Moreover, I already use lots of the filtercache for other fq-queries. About 99% of the 6000 fields I mentioned have there values seperately  in the filtercache. There must be room for optimization there, but that's a different story ;-)

//Geert-Jan

Lance Norskog wrote
You could make these filter queries. Filters are a separate cache and as
long as you have more cache than queries they will remain pinned in RAM.
Your code has to remember these special queries in special-case code, and
create dummy query strings to fetch the filter query.  "field:[* TO *]" will
do nicely.

Cheers,

Lance Norskog

-----Original Message-----
From: Britske [mailto:gbrits@gmail.com]
Sent: Thursday, October 04, 2007 1:38 PM
To: solr-user@lucene.apache.org
Subject: Re: how to make sure a particular query is ALWAYS cached



hossman wrote:
>
>
> : I want a couple of costly queries to be cached at all times in the
> : queryResultCache. (unless I have a new searcher of course)
>
> first off: you can ensure that certain queries are in the cache, even
> if there is a newSearcher, just configure a newSearcher Event Listener
> that forcibly warms the queries you care about.
>
> (this is particularly handy to ensure FieldCache gets populated before
> any user queries are processed)
>
> Second: if i understand correctly, you want a way to put an object in
> the cache, and garuntee that it's always in the cache, even if other
> objects are more frequetnly used or more recently used?
>
> that's kind of a weird use case ... can you elaborate a little more on
> what exactly your end goal is?
>
>

Sure.  Actually i got the idea of another thread posted by Thomas to which
you gave a reply a few days ago:
http://www.nabble.com/Result-grouping-options-tf4522284.html#a12900630.
I quote the relevant bits below, although I think you remember:


hossman wrote:
>
> : Is it possible to use faceting to not only get the facet count but
> also the
> : top-n documents for every facet
> : directly? If not, how hard would it be to implement this as an
> extension?
>
> not hard ... a custom request handler could subclass
> StandardRequestHandler, call super.handleRequest, and then pull the
> field faceting info out of the response object, and fetch a DocList
> for each of the top N field constraints.
>

I have about a dozen queries that I want to have permanently cached, each
corresponding to a particular navigation page. Each of these pages has up to
about 10 top-N lists which are populated as discussed above. These lists are
pretty static (updated once a day, together with the index).

The above would enable me to populate all the lists on a single page in 1
pass. Correct?
Although I haven't tried yet, I can't imagine that this request returns in
sub-zero seconds, which is what I want (having a index of about 1M docs with
6000 fields/ doc and about 10 complex facetqueries / request).

The navigation-pages are pretty important for, eh well navigation ;-) and
although I can rely on frequent access of these pages most of the time, it
is not guarenteed (so neither is the caching)


hossman wrote:
>
> the most straightforward approach i can think of would be a new cache
> implementation that "permenantly" stores the first N items you put in it.
> that in combination with the newSearcher warming i described above
> should work.
>
> : 1. use User/Generic-cache. This would result in seperate coding-path
> in
> : application which I would like to avoid.
> : 2. exend LRU-cache, and extend request-handler so that a query can
> be
> : extended with a parameter indicating that it should be cached at all
> times.
> : However, this seems like a lot of cluttering-up these interfaces,
> for a
> : relatively small change.
>
> #1 wouldn't really accomplish what you want without #2 as well.
>
>
>
> -Hoss
>
>
>

regarding #1.
Wouldn't making a user-cache for the sole-purpose of storing these queries
be enough? I could then reference this user-cache by name, and extract the
correct queryresult. (at least that's how I read the documentation, I have
no previous experience with the user-cache mechanism).  In that case I don't
need #2 right? Or is this for another reason not a good way to handle
things?

//Geert-Jan

--
View this message in context:
http://www.nabble.com/how-to-make-sure-a-particular-query-is-ALWAYS-cached-t
f4566711.html#a13048285
Sent from the Solr - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: how to make sure a particular query is ALWAYS cached

hossman
In reply to this post by britske

: Although I haven't tried yet, I can't imagine that this request returns in
: sub-zero seconds, which is what I want (having a index of about 1M docs with
: 6000 fields/ doc and about 10 complex facetqueries / request).

i wouldn't neccessarily assume that :)  

If you have a request handler which does a query with a facet.field, and
then does a followup query for the top N constraings in that facet.field,
the time needed to execute that handler on a cold index should primarily
depend on the faceting aspect and how many unique terms there are in that
field.  try it and see.

: The navigation-pages are pretty important for, eh well navigation ;-) and
: although I can rely on frequent access of these pages most of the time, it
: is not guarenteed (so neither is the caching)

if i were in your shoes: i wouldn't worry about it.  i would setup
"cold cache warming" of the important queries using a firstSearcher event
listener, i would setup autowarming on the caches, i would setup explicit
warming of queries using sort fields i care about in a newSearcher event
listener, andi would make sure to tune my caches so that they were big
enough to contain a much larger number of entries then are used by my
custom request handler for the queris i care about (especially if my index
only changed a few times a day, the caches become a huge win in that case,
so throw everything you've got at them)

and for the record: i've been in your shoes.

From a purely theoretical standpoint: if enough other requests are coming
in fast enough to expunge the objects used by your "important" navigation
pages from the caches ... then those pages aren't that important (at least
not to your end users as an aggregate)

on the other hand: if you've got discreet pools of users (like say:
customers who do searches, vs your boss who thiks navigation pages are
really important) then another appraoch is to have to ports searching
queries -- one that you send your navigation type queries to (with the
caches tuned appropriately) and one that you send other traffic to (with
caches tuned appropriately) ... i do that for one major index, it makes a
lot of sense when you have very distinct usage profiles and you want to
get the most bang for your buck cache wise.


: > #1 wouldn't really accomplish what you want without #2 as well.

: regarding #1.
: Wouldn't making a user-cache for the sole-purpose of storing these queries
: be enough? I could then reference this user-cache by name, and extract the

only if you also write a custom request handler ... that was my point
before it was clear that you were already doing that no matter what (you
had custom request handler listed in #2)

you could definitely make sure to explicitly put all of your DocLists in
your own usercache, that will certainly work.  but frankly, based on
what you've described about your use case, and how often your data
cahnges, it would probably be easier to set up a layer of caching in front
of Solr (since you are concerned with ensuring *all* of the date
for these important pages gets cached) ... something like an HTTP reverse
proxy cache (aka: acelerator proxy) would help you ensure that thes whole
pages were getting cached.

i've never tried it, but in theory: you could even setup a newSearcher
event listener to trigger a little script to ping your proxy with a
request thatforced it to revalidate the query when your index changes.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: how to make sure a particular query is ALWAYS cached

britske
seperating requests over 2 ports is a nice solution when having multiple user-types. I like that althuigh I don't think i need it for this case.

I'm just going to go the 'normal' caching-route and see where that takes me, instead of thinking it can't be done upfront :-)

Thanks!


hossman wrote
: Although I haven't tried yet, I can't imagine that this request returns in
: sub-zero seconds, which is what I want (having a index of about 1M docs with
: 6000 fields/ doc and about 10 complex facetqueries / request).

i wouldn't neccessarily assume that :)  

If you have a request handler which does a query with a facet.field, and
then does a followup query for the top N constraings in that facet.field,
the time needed to execute that handler on a cold index should primarily
depend on the faceting aspect and how many unique terms there are in that
field.  try it and see.

: The navigation-pages are pretty important for, eh well navigation ;-) and
: although I can rely on frequent access of these pages most of the time, it
: is not guarenteed (so neither is the caching)

if i were in your shoes: i wouldn't worry about it.  i would setup
"cold cache warming" of the important queries using a firstSearcher event
listener, i would setup autowarming on the caches, i would setup explicit
warming of queries using sort fields i care about in a newSearcher event
listener, andi would make sure to tune my caches so that they were big
enough to contain a much larger number of entries then are used by my
custom request handler for the queris i care about (especially if my index
only changed a few times a day, the caches become a huge win in that case,
so throw everything you've got at them)

and for the record: i've been in your shoes.

From a purely theoretical standpoint: if enough other requests are coming
in fast enough to expunge the objects used by your "important" navigation
pages from the caches ... then those pages aren't that important (at least
not to your end users as an aggregate)

on the other hand: if you've got discreet pools of users (like say:
customers who do searches, vs your boss who thiks navigation pages are
really important) then another appraoch is to have to ports searching
queries -- one that you send your navigation type queries to (with the
caches tuned appropriately) and one that you send other traffic to (with
caches tuned appropriately) ... i do that for one major index, it makes a
lot of sense when you have very distinct usage profiles and you want to
get the most bang for your buck cache wise.


: > #1 wouldn't really accomplish what you want without #2 as well.

: regarding #1.
: Wouldn't making a user-cache for the sole-purpose of storing these queries
: be enough? I could then reference this user-cache by name, and extract the

only if you also write a custom request handler ... that was my point
before it was clear that you were already doing that no matter what (you
had custom request handler listed in #2)

you could definitely make sure to explicitly put all of your DocLists in
your own usercache, that will certainly work.  but frankly, based on
what you've described about your use case, and how often your data
cahnges, it would probably be easier to set up a layer of caching in front
of Solr (since you are concerned with ensuring *all* of the date
for these important pages gets cached) ... something like an HTTP reverse
proxy cache (aka: acelerator proxy) would help you ensure that thes whole
pages were getting cached.

i've never tried it, but in theory: you could even setup a newSearcher
event listener to trigger a little script to ping your proxy with a
request thatforced it to revalidate the query when your index changes.



-Hoss