HTTP caching and distributed search

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

HTTP caching and distributed search

Charlie Jackson
Currently, I've got a Solr setup in which we're distributing searches
across two cores on a machine, say core1 and core2. I'm toying with the
notion of enabling Solr's HTTP caching on our system, but I noticed an
oddity when using it in combination with distributed searching. Say, for
example, I have this query:

 

http://localhost:8080/solr/core1/select/?q=google&start=0&rows=10&shards
=localhost:8080/solr/core1,localhost:8080/solr/core2

 

Both cores have HTTP caching enabled, and it seems to be working. First
time I run the query through Squid, it correctly sees it doesn't have
this cached and so requests it from Solr. Second time I request it, it
hits the Squid cache. That part works fine.

 

Here's the problem. If I commit to core1, it changes the ETag value of
the request, which will invalidate the cache, as it should. But
committing to core2 doesn't, so I get the cached version back, even
though core2 has changed and the cache is stale. I'm guessing this is
because the request is going against core1, hence using core1's cache
values, but in a distributed search, it seems like it should be using
cache values from all cores in the shards parameter. Is this a known
issue, and if so, is there a patch for it?

 

Thanks,

Charlie

Reply | Threaded
Open this post in threaded view
|

Re: HTTP caching and distributed search

Shalin Shekhar Mangar
On Wed, Feb 3, 2010 at 12:21 AM, Charlie Jackson <[hidden email]
> wrote:

> Currently, I've got a Solr setup in which we're distributing searches
> across two cores on a machine, say core1 and core2. I'm toying with the
> notion of enabling Solr's HTTP caching on our system, but I noticed an
> oddity when using it in combination with distributed searching. Say, for
> example, I have this query:
>
> http://localhost:8080/solr/core1/select/?q=google&start=0&rows=10&shards
> =localhost:8080/solr/core1,localhost:8080/solr/core2
>
>
> Both cores have HTTP caching enabled, and it seems to be working. First
> time I run the query through Squid, it correctly sees it doesn't have
> this cached and so requests it from Solr. Second time I request it, it
> hits the Squid cache. That part works fine.
>
> Here's the problem. If I commit to core1, it changes the ETag value of
> the request, which will invalidate the cache, as it should. But
> committing to core2 doesn't, so I get the cached version back, even
> though core2 has changed and the cache is stale. I'm guessing this is
> because the request is going against core1, hence using core1's cache
> values, but in a distributed search, it seems like it should be using
> cache values from all cores in the shards parameter. Is this a known
> issue, and if so, is there a patch for it?
>
>
You are right, etag is calculated using the searcher on core1 only and it
does not take other shards into account. Can you open a Jira issue?

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: HTTP caching and distributed search

hossman

: > http://localhost:8080/solr/core1/select/?q=google&start=0&rows=10&shards
: > =localhost:8080/solr/core1,localhost:8080/solr/core2

: You are right, etag is calculated using the searcher on core1 only and it
: does not take other shards into account. Can you open a Jira issue?

...as a possible work arround i would suggest creating a seperate
"coordinator" core that is neither core1 nor core2 ... it doesn't have to
have any docs in it, it just has to have consistent schemas with the other
two cores.  That way you can use a distinct <httpCaching /> settings on
the coordinator core (perhaps never304="true" but with an explicit
<cacheControl/> setting? ... or lastModifiedFrom="openTime" and then you
could send an explicit "commit" to the (empty) coordinator core anytime
you modify one of the shards.



-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: HTTP caching and distributed search

Charlie Jackson
I tried your suggestion, Hoss, but committing to the new coordinator
core doesn't change the indexVersion and therefore the ETag value isn't
changed.

I opened a new JIRA issue for this
http://issues.apache.org/jira/browse/SOLR-1765


Thanks,
Charlie


-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: Thursday, February 04, 2010 2:16 PM
To: [hidden email]
Subject: Re: HTTP caching and distributed search


: >
http://localhost:8080/solr/core1/select/?q=google&start=0&rows=10&shards
: > =localhost:8080/solr/core1,localhost:8080/solr/core2

: You are right, etag is calculated using the searcher on core1 only and
it
: does not take other shards into account. Can you open a Jira issue?

...as a possible work arround i would suggest creating a seperate
"coordinator" core that is neither core1 nor core2 ... it doesn't have
to
have any docs in it, it just has to have consistent schemas with the
other
two cores.  That way you can use a distinct <httpCaching /> settings on
the coordinator core (perhaps never304="true" but with an explicit
<cacheControl/> setting? ... or lastModifiedFrom="openTime" and then you

could send an explicit "commit" to the (empty) coordinator core anytime
you modify one of the shards.



-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: HTTP caching and distributed search

hossman

: I tried your suggestion, Hoss, but committing to the new coordinator
: core doesn't change the indexVersion and therefore the ETag value isn't
: changed.

Hmmm... so the "empty" commit doesn't change the indexVersion? ... i
didn't realize that.

Well, I suppose you could replace your empty commit with an update to a
bogus document ... it's hackish, but it should work...

http://host/solr/coordinator/update?stream.body=<add><doc><field name="bogus">bogus</field></doc></add>&commit=true




-Hoss