request handler and caches

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

request handler and caches

Erik Hatcher
I build a "facet" cache in my request handler, but I need it to get  
refreshed when the index changes.  How can my custom request handler  
manage this cache and get notified when the index changes?

Thanks,
        Erik

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Yonik Seeley
On 5/10/06, Erik Hatcher <[hidden email]> wrote:
> I build a "facet" cache in my request handler, but I need it to get
> refreshed when the index changes.  How can my custom request handler
> manage this cache and get notified when the index changes?

The easiest way is to let Solr keep the cache (use a custom user cache
defined in the solrconfig.xml) and implement a Regenerator that is
called to create and refresh a new instance when the searcher is
changed.

Does that suit your needs?

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Yonik Seeley
Here's an example of the configuration from solrconfig.xml:

    <!-- Example of a generic cache.  These caches may be accessed by name
         through SolrIndexSearcher.getCache(),cacheLookup(), and cacheInsert().
         The purpose is to enable easy caching of user/application level data.
         The regenerator argument should be specified as an implementation
         of solr.search.CacheRegenerator if autowarming is desired.  -->
    <!--
    <cache name="myUserCache"
      class="solr.LRUCache"
      size="4096"
      initialSize="1024"
      autowarmCount="1024"
      regenerator="org.mycompany.mypackage.MyRegenerator"
      />
    -->

-Yonik

On 5/10/06, Yonik Seeley <[hidden email]> wrote:

> On 5/10/06, Erik Hatcher <[hidden email]> wrote:
> > I build a "facet" cache in my request handler, but I need it to get
> > refreshed when the index changes.  How can my custom request handler
> > manage this cache and get notified when the index changes?
>
> The easiest way is to let Solr keep the cache (use a custom user cache
> defined in the solrconfig.xml) and implement a Regenerator that is
> called to create and refresh a new instance when the searcher is
> changed.
>
> Does that suit your needs?
>
> -Yonik
>
Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Erik Hatcher
In reply to this post by Yonik Seeley
It probably suits my needs perfectly!   My apologies for asking yet  
another question that can be answered by reading the configuration  
file :)

I'm refactoring now to using a SolrCache and a regenerator to see how  
it goes.

        Erik

On May 10, 2006, at 1:18 PM, Yonik Seeley wrote:

> On 5/10/06, Erik Hatcher <[hidden email]> wrote:
>> I build a "facet" cache in my request handler, but I need it to get
>> refreshed when the index changes.  How can my custom request handler
>> manage this cache and get notified when the index changes?
>
> The easiest way is to let Solr keep the cache (use a custom user cache
> defined in the solrconfig.xml) and implement a Regenerator that is
> called to create and refresh a new instance when the searcher is
> changed.
>
> Does that suit your needs?
>
> -Yonik

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Chris Hostetter-3
In reply to this post by Yonik Seeley

: The easiest way is to let Solr keep the cache (use a custom user cache
: defined in the solrconfig.xml) and implement a Regenerator that is
: called to create and refresh a new instance when the searcher is
: changed.

If you have some reason why you can't or don't want to use a SolrCache,
the other trick you can use is to have a special query param on your
RequestHandler that tells it to do a bunch of work that will load the data
into your cache, and then configure firstSearcher and newSearcher events
to "ping" your handler with that param...

    <listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst> <str name="qt">your_custom_hanler</str>
              <str name="forfce_cache_update">1</str>
        </lst>
      </arr>
    </listener>

I recommend using a SolrCache instead of this approach though, because a
Regenerators for SolrCaches have the benefit of knowinging state from the
old cache when they populate the new cache. ... but this appraoch works
really well for "pre-filling" a cache on server start up (firstSearcher)


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Erik Hatcher
In reply to this post by Yonik Seeley
I've started down this route, but I'm not sure how to initialize my  
cache the first time.

I need access to the IndexReader to build the cache, and at this  
point I don't need any incremental cache updates - if a new  
IndexSearcher is swapped in, I want to rebuild the cache.

Should I combine a custom SolrCache with a newSearcher listener to  
have it generated right away?   I put in a dummy cache and  
regenerator, but only see the cache .init() method being called, not  
the warm() method (on application server startup).  How can I  
bootstrap it such that my cache gets built on app. startup?

Thanks,
        Erik


On May 10, 2006, at 1:20 PM, Yonik Seeley wrote:

> Here's an example of the configuration from solrconfig.xml:
>
>    <!-- Example of a generic cache.  These caches may be accessed  
> by name
>         through SolrIndexSearcher.getCache(),cacheLookup(), and  
> cacheInsert().
>         The purpose is to enable easy caching of user/application  
> level data.
>         The regenerator argument should be specified as an  
> implementation
>         of solr.search.CacheRegenerator if autowarming is desired.  
> -->
>    <!--
>    <cache name="myUserCache"
>      class="solr.LRUCache"
>      size="4096"
>      initialSize="1024"
>      autowarmCount="1024"
>      regenerator="org.mycompany.mypackage.MyRegenerator"
>      />
>    -->
>
> -Yonik
>
> On 5/10/06, Yonik Seeley <[hidden email]> wrote:
>> On 5/10/06, Erik Hatcher <[hidden email]> wrote:
>> > I build a "facet" cache in my request handler, but I need it to get
>> > refreshed when the index changes.  How can my custom request  
>> handler
>> > manage this cache and get notified when the index changes?
>>
>> The easiest way is to let Solr keep the cache (use a custom user  
>> cache
>> defined in the solrconfig.xml) and implement a Regenerator that is
>> called to create and refresh a new instance when the searcher is
>> changed.
>>
>> Does that suit your needs?
>>
>> -Yonik
>>

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Yonik Seeley
On 5/10/06, Erik Hatcher <[hidden email]> wrote:

> I've started down this route, but I'm not sure how to initialize my
> cache the first time.
>
> I need access to the IndexReader to build the cache, and at this
> point I don't need any incremental cache updates - if a new
> IndexSearcher is swapped in, I want to rebuild the cache.
>
> Should I combine a custom SolrCache with a newSearcher listener to
> have it generated right away?   I put in a dummy cache and
> regenerator, but only see the cache .init() method being called, not
> the warm() method (on application server startup).  How can I
> bootstrap it such that my cache gets built on app. startup?

If your cache is populated as the result of any request to your
plugin, simply send a request via a firstSearcher hook.  If it's not
populated for any request, then send a special request that your
plugin would recognize as a "populate cache" request.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Erik Hatcher

On May 10, 2006, at 3:01 PM, Yonik Seeley wrote:

> On 5/10/06, Erik Hatcher <[hidden email]> wrote:
>> I've started down this route, but I'm not sure how to initialize my
>> cache the first time.
>>
>> I need access to the IndexReader to build the cache, and at this
>> point I don't need any incremental cache updates - if a new
>> IndexSearcher is swapped in, I want to rebuild the cache.
>>
>> Should I combine a custom SolrCache with a newSearcher listener to
>> have it generated right away?   I put in a dummy cache and
>> regenerator, but only see the cache .init() method being called, not
>> the warm() method (on application server startup).  How can I
>> bootstrap it such that my cache gets built on app. startup?
>
> If your cache is populated as the result of any request to your
> plugin, simply send a request via a firstSearcher hook.  If it's not
> populated for any request, then send a special request that your
> plugin would recognize as a "populate cache" request.

Sorry I'm being dense today, though I really do appreciate the  
incredibly fast response time you and Hoss have on this.  My cache is  
not available in newSearcher() at startup time:

public class CacheFacetsListener implements SolrEventListener {
   public void init(NamedList namedList) {
   }

   public void postCommit() {
     throw new UnsupportedOperationException();
   }

   public void newSearcher(SolrIndexSearcher newSearcher,  
SolrIndexSearcher currentSearcher) {
     try {
       SolrCache cache = newSearcher.getCache("facet_cache");
       if (cache == null) {
         System.out.println("!!!!! cache is null");
       }
       cache.warm(newSearcher, null);
     } catch (IOException e) {
       log.severe(e.getMessage());
     }
   }
}

I'm getting the "cache is null" message.  Though the cache is created  
and init()'d as I see it's diagnostic output in Jetty's console  
before the NPE:

public class FacetCache implements SolrCache {
   private Map facetCache;  // key is field, and key to inner map is  
value
   private State state;

   private void loadFacets(IndexReader reader) throws IOException {
     System.out.println("Loading facets for " + reader.numDocs() + "  
documents ...");

     // ....

     System.out.println("Done loading facets.");
   }


   public Object init(Map args, Object persistence, CacheRegenerator  
regenerator) {
     state=State.CREATED;
     System.out.println("<<<<< FacetCache.init >>>>>");
     return persistence;
   }

   // ......

}

And from solrconfig.xml:

     <cache name="facet_cache"
       class="org.nines.FacetCache"
     />

The console has this output:

May 10, 2006 3:44:26 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@cfe790 main
<<<<< FacetCache.init >>>>>
May 10, 2006 3:44:26 PM org.apache.solr.core.SolrCore registerSearcher
INFO: Registered new searcher Searcher@cfe790 main
!!!!! cache is null
May 10, 2006 3:44:26 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.NullPointerException
         at org.nines.CacheFacetsListener.newSearcher
(CacheFacetsListener.java:24)
         at org.apache.solr.core.SolrCore$2.call(SolrCore.java:427)


I'm probably making this more difficult than it needs to be, but  
today I'm slow :)   What am I doing wrong?

Thanks,
        Erik

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Yonik Seeley
Unless you really need special cache behavior, can you use the
LRUCache that comes with Solr?  It will make your life easier using a
well tested cache implementation.  You can size it large enough so
that items never get dropped if that's the issue.

You shouldn't really need to implement your own Listener either (while
valid, it's a tougher approach).  You could just send your plugin a
message via the builtin QuerytSenderListener.

    <listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst> <str name="q">fast_warm</str> <str name="start">0</str> <str name=
"rows">10</str> </lst>
      </arr>
    </listener>

If you do choose to implement your own listener, you need to register
it, like above.
Details on how your facet cache is supposed to work might help with
answering future questions.

-Yonik

On 5/10/06, Erik Hatcher <[hidden email]> wrote:

>
> On May 10, 2006, at 3:01 PM, Yonik Seeley wrote:
>
> > On 5/10/06, Erik Hatcher <[hidden email]> wrote:
> >> I've started down this route, but I'm not sure how to initialize my
> >> cache the first time.
> >>
> >> I need access to the IndexReader to build the cache, and at this
> >> point I don't need any incremental cache updates - if a new
> >> IndexSearcher is swapped in, I want to rebuild the cache.
> >>
> >> Should I combine a custom SolrCache with a newSearcher listener to
> >> have it generated right away?   I put in a dummy cache and
> >> regenerator, but only see the cache .init() method being called, not
> >> the warm() method (on application server startup).  How can I
> >> bootstrap it such that my cache gets built on app. startup?
> >
> > If your cache is populated as the result of any request to your
> > plugin, simply send a request via a firstSearcher hook.  If it's not
> > populated for any request, then send a special request that your
> > plugin would recognize as a "populate cache" request.
>
> Sorry I'm being dense today, though I really do appreciate the
> incredibly fast response time you and Hoss have on this.  My cache is
> not available in newSearcher() at startup time:
>
> public class CacheFacetsListener implements SolrEventListener {
>    public void init(NamedList namedList) {
>    }
>
>    public void postCommit() {
>      throw new UnsupportedOperationException();
>    }
>
>    public void newSearcher(SolrIndexSearcher newSearcher,
> SolrIndexSearcher currentSearcher) {
>      try {
>        SolrCache cache = newSearcher.getCache("facet_cache");
>        if (cache == null) {
>          System.out.println("!!!!! cache is null");
>        }
>        cache.warm(newSearcher, null);
>      } catch (IOException e) {
>        log.severe(e.getMessage());
>      }
>    }
> }
>
> I'm getting the "cache is null" message.  Though the cache is created
> and init()'d as I see it's diagnostic output in Jetty's console
> before the NPE:
>
> public class FacetCache implements SolrCache {
>    private Map facetCache;  // key is field, and key to inner map is
> value
>    private State state;
>
>    private void loadFacets(IndexReader reader) throws IOException {
>      System.out.println("Loading facets for " + reader.numDocs() + "
> documents ...");
>
>      // ....
>
>      System.out.println("Done loading facets.");
>    }
>
>
>    public Object init(Map args, Object persistence, CacheRegenerator
> regenerator) {
>      state=State.CREATED;
>      System.out.println("<<<<< FacetCache.init >>>>>");
>      return persistence;
>    }
>
>    // ......
>
> }
>
> And from solrconfig.xml:
>
>      <cache name="facet_cache"
>        class="org.nines.FacetCache"
>      />
>
> The console has this output:
>
> May 10, 2006 3:44:26 PM org.apache.solr.search.SolrIndexSearcher <init>
> INFO: Opening Searcher@cfe790 main
> <<<<< FacetCache.init >>>>>
> May 10, 2006 3:44:26 PM org.apache.solr.core.SolrCore registerSearcher
> INFO: Registered new searcher Searcher@cfe790 main
> !!!!! cache is null
> May 10, 2006 3:44:26 PM org.apache.solr.core.SolrException log
> SEVERE: java.lang.NullPointerException
>          at org.nines.CacheFacetsListener.newSearcher
> (CacheFacetsListener.java:24)
>          at org.apache.solr.core.SolrCore$2.call(SolrCore.java:427)
>
>
> I'm probably making this more difficult than it needs to be, but
> today I'm slow :)   What am I doing wrong?
>
> Thanks,
>         Erik
Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Erik Hatcher
On May 10, 2006, at 4:11 PM, Yonik Seeley wrote:
> Unless you really need special cache behavior, can you use the
> LRUCache that comes with Solr?

Sure, I suppose I could use that, but it had more bells and whistles  
than I need.  I like to look at the interfaces and work from there  
and then use base classes that fit.  LRUCache didn't seem to fit what  
I wanted without contorting its configuration.

>   It will make your life easier using a
> well tested cache implementation.

My cache is really just a static cache of BitSet's for a fixed set of  
fields and their values.  With my current index size, creating the  
cache is incredibly fast (a second or so), but the index will grow  
much larger.

> You shouldn't really need to implement your own Listener either (while
> valid, it's a tougher approach).  You could just send your plugin a
> message via the builtin QuerytSenderListener.
>
>    <listener event="firstSearcher" class="solr.QuerySenderListener">
>      <arr name="queries">
>        <lst> <str name="q">fast_warm</str> <str name="start">0</
> str> <str name=
> "rows">10</str> </lst>
>      </arr>
>    </listener>

I see.... which is basically what I was already doing by having the  
cache lazy initialize on the first request in the request handler  
(aka plugin), except the first request is coming from startup thanks  
to the listener architecture.

> If you do choose to implement your own listener, you need to register
> it, like above.

I did, but omitted it from my previous details:

        <listener event="firstSearcher" class="org.nines.CacheFacetsListener"/>

> Details on how your facet cache is supposed to work might help with
> answering future questions.

For a fixed set of fields (currently 4 or so of them) I'm building a  
HashMap keyed by field name, with the values of each key also a  
HashMap, keyed by term value.  The value of the inner HashMap is a  
BitSet representing all documents that have that value for that  
field.  These BitSets are used for a faceted browser and ANDed  
together based on user criteria, as well as combined with full-text  
queries using QueryFilter's BitSet.  Nothing fancy, and perhaps  
something Solr already helps provide?

The question still remains - why isn't my cache available from a  
firstSearcher .newSearcher() method?  The cache is created prior (as  
noted in the console output).

        Erik



>
> -Yonik
>
> On 5/10/06, Erik Hatcher <[hidden email]> wrote:
>>
>> On May 10, 2006, at 3:01 PM, Yonik Seeley wrote:
>>
>> > On 5/10/06, Erik Hatcher <[hidden email]> wrote:
>> >> I've started down this route, but I'm not sure how to  
>> initialize my
>> >> cache the first time.
>> >>
>> >> I need access to the IndexReader to build the cache, and at this
>> >> point I don't need any incremental cache updates - if a new
>> >> IndexSearcher is swapped in, I want to rebuild the cache.
>> >>
>> >> Should I combine a custom SolrCache with a newSearcher listener to
>> >> have it generated right away?   I put in a dummy cache and
>> >> regenerator, but only see the cache .init() method being  
>> called, not
>> >> the warm() method (on application server startup).  How can I
>> >> bootstrap it such that my cache gets built on app. startup?
>> >
>> > If your cache is populated as the result of any request to your
>> > plugin, simply send a request via a firstSearcher hook.  If it's  
>> not
>> > populated for any request, then send a special request that your
>> > plugin would recognize as a "populate cache" request.
>>
>> Sorry I'm being dense today, though I really do appreciate the
>> incredibly fast response time you and Hoss have on this.  My cache is
>> not available in newSearcher() at startup time:
>>
>> public class CacheFacetsListener implements SolrEventListener {
>>    public void init(NamedList namedList) {
>>    }
>>
>>    public void postCommit() {
>>      throw new UnsupportedOperationException();
>>    }
>>
>>    public void newSearcher(SolrIndexSearcher newSearcher,
>> SolrIndexSearcher currentSearcher) {
>>      try {
>>        SolrCache cache = newSearcher.getCache("facet_cache");
>>        if (cache == null) {
>>          System.out.println("!!!!! cache is null");
>>        }
>>        cache.warm(newSearcher, null);
>>      } catch (IOException e) {
>>        log.severe(e.getMessage());
>>      }
>>    }
>> }
>>
>> I'm getting the "cache is null" message.  Though the cache is created
>> and init()'d as I see it's diagnostic output in Jetty's console
>> before the NPE:
>>
>> public class FacetCache implements SolrCache {
>>    private Map facetCache;  // key is field, and key to inner map is
>> value
>>    private State state;
>>
>>    private void loadFacets(IndexReader reader) throws IOException {
>>      System.out.println("Loading facets for " + reader.numDocs() + "
>> documents ...");
>>
>>      // ....
>>
>>      System.out.println("Done loading facets.");
>>    }
>>
>>
>>    public Object init(Map args, Object persistence, CacheRegenerator
>> regenerator) {
>>      state=State.CREATED;
>>      System.out.println("<<<<< FacetCache.init >>>>>");
>>      return persistence;
>>    }
>>
>>    // ......
>>
>> }
>>
>> And from solrconfig.xml:
>>
>>      <cache name="facet_cache"
>>        class="org.nines.FacetCache"
>>      />
>>
>> The console has this output:
>>
>> May 10, 2006 3:44:26 PM org.apache.solr.search.SolrIndexSearcher  
>> <init>
>> INFO: Opening Searcher@cfe790 main
>> <<<<< FacetCache.init >>>>>
>> May 10, 2006 3:44:26 PM org.apache.solr.core.SolrCore  
>> registerSearcher
>> INFO: Registered new searcher Searcher@cfe790 main
>> !!!!! cache is null
>> May 10, 2006 3:44:26 PM org.apache.solr.core.SolrException log
>> SEVERE: java.lang.NullPointerException
>>          at org.nines.CacheFacetsListener.newSearcher
>> (CacheFacetsListener.java:24)
>>          at org.apache.solr.core.SolrCore$2.call(SolrCore.java:427)
>>
>>
>> I'm probably making this more difficult than it needs to be, but
>> today I'm slow :)   What am I doing wrong?
>>
>> Thanks,
>>         Erik

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Yonik Seeley
On 5/10/06, Erik Hatcher <[hidden email]> wrote:
> For a fixed set of fields (currently 4 or so of them) I'm building a
> HashMap keyed by field name, with the values of each key also a
> HashMap, keyed by term value.  The value of the inner HashMap is a
> BitSet representing all documents that have that value for that
> field.  These BitSets are used for a faceted browser and ANDed
> together based on user criteria, as well as combined with full-text
> queries using QueryFilter's BitSet.  Nothing fancy, and perhaps
> something Solr already helps provide?

Using Solr's DocSet implementations will dramatically speed up your
faceted browsing and reduce your memory footprint.  You could store
these DocSets yourself (and turn off the filter cache so things aren't
doubly stored), but here is how I might go about it:

In your custom cache, just store the terms for the faceting fields
(everything but the bitsets).
field1 -> [term1, term2, term3, term4]
field2 -> [terma, termb, termc, termd]

Then when it comes time to get the count of items matching query x,
do
  count1 = searcher.numDocs(x,TermQuery(term1))
  count2 = searcher.numDocs(x,TermQuery(term2))
  ...

Solr will check the filter cache for "x" and for the TermQuery facets,
and generate them on the fly if they are not found.

What you loose:
  - teeny bit of performance because each facet gets looked up in a
HashMap (I've profiled... this has been negligible for us)

What you gain:
 - re-use of the filtercache (including the filter for the base
query), much faster intersections with less average memory usage &
less garbage produced
 - an ability to easily cap the number of filters used for the facets,
allowing a gradual reduction in performance as cache hits lower,
rather than an OOM.

> The question still remains - why isn't my cache available from a
> firstSearcher .newSearcher() method?  The cache is created prior (as
> noted in the console output).

Great question... it should be, as it's created and registered in the
SolrIndexSearcher constructor.  is the cache .name() returning the
right thing?

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Chris Hostetter-3

: Great question... it should be, as it's created and registered in the
: SolrIndexSearcher constructor.  is the cache .name() returning the
: right thing?

I was just about to ask that. it wasn't untill i started digging into
CacheCOnfig and SolrIndexSearcher because of this thread that i realized
it doesn't matter what "name" attribute you give your cache in the config,
the SolrCache implimentation itself is responsible for specifying the name
that can be used to access the searcher with SolrIndexSearcher.getCache()

if you define your MyCache.name function to be...

   public String name() { return "foo"; }

then even if you have...

     <cache name="facet_cache"
       class="org.foo.MyCache"
     />

...you'll access your cache using the name "foo".

if you want to pay attention to the name specified i nteh config, that's
the responsability of your init method to get it from the Map or args.

(you didn't mention what your name() method looks like, but you did
include your init method, and i can see you aren't looking at the Map at
all)



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Erik Hatcher

On May 10, 2006, at 5:31 PM, Chris Hostetter wrote:

>
> : Great question... it should be, as it's created and registered in  
> the
> : SolrIndexSearcher constructor.  is the cache .name() returning the
> : right thing?
>
> I was just about to ask that. it wasn't untill i started digging into
> CacheCOnfig and SolrIndexSearcher because of this thread that i  
> realized
> it doesn't matter what "name" attribute you give your cache in the  
> config,
> the SolrCache implimentation itself is responsible for specifying  
> the name
> that can be used to access the searcher with  
> SolrIndexSearcher.getCache()
>
> if you define your MyCache.name function to be...
>
>    public String name() { return "foo"; }
>
> then even if you have...
>
>      <cache name="facet_cache"
>        class="org.foo.MyCache"
>      />
>
> ...you'll access your cache using the name "foo".

Ah, that was an issue in my code then.  I simply had all of the  
"unnecessary" methods implemented returning a default value (null for  
Object return values).  I now return the value of args.get("name")  
which is "facet_cache" in my case... but I'm still getting the same NPE.

> if you want to pay attention to the name specified i nteh config,  
> that's
> the responsability of your init method to get it from the Map or args.

Odd :)  But, I've adjusted my code to account for this.  Why would  
you ever want different names than what is specified in the config  
file?  Also, it is confusing because there is a name() and getName()  
methods required to implement a SolrCache.

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Erik Hatcher

On May 10, 2006, at 6:38 PM, Erik Hatcher wrote:

> On May 10, 2006, at 5:31 PM, Chris Hostetter wrote:
>
>>
>> : Great question... it should be, as it's created and registered  
>> in the
>> : SolrIndexSearcher constructor.  is the cache .name() returning the
>> : right thing?
>>
>> I was just about to ask that. it wasn't untill i started digging into
>> CacheCOnfig and SolrIndexSearcher because of this thread that i  
>> realized
>> it doesn't matter what "name" attribute you give your cache in the  
>> config,
>> the SolrCache implimentation itself is responsible for specifying  
>> the name
>> that can be used to access the searcher with  
>> SolrIndexSearcher.getCache()
>>
>> if you define your MyCache.name function to be...
>>
>>    public String name() { return "foo"; }
>>
>> then even if you have...
>>
>>      <cache name="facet_cache"
>>        class="org.foo.MyCache"
>>      />
>>
>> ...you'll access your cache using the name "foo".
>
> Ah, that was an issue in my code then.  I simply had all of the  
> "unnecessary" methods implemented returning a default value (null  
> for Object return values).  I now return the value of args.get
> ("name") which is "facet_cache" in my case... but I'm still getting  
> the same NPE.

Uh, never mind.... my e-mail was written over the course of a few  
minutes as I was trying things, and I inadvertently returned the name  
from getName() instead of name().  All is now well.

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Yonik Seeley
In reply to this post by Erik Hatcher
On 5/10/06, Erik Hatcher <[hidden email]> wrote:
> Odd :)  But, I've adjusted my code to account for this.  Why would
> you ever want different names than what is specified in the config
> file?

You probably wouldn't... as far as I remember, it was just the easiest
way to get the name at the time.  Many parts of Solr were done in an
extreme rapid-apps type environment... I implemented it as fast as I
could, no peer review, often past midnight, etc ;-)

The cache name can be nice for the cache to have when implementing
logging or toString() for instance.  Since I already had the name, I
used it as the key.  Something like CacheConfig (it represents the
entry in solrconfig.xml) could have a getCacheName(), removing the
need for the cache to keep track of it.

>  Also, it is confusing because there is a name() and getName()
> methods required to implement a SolrCache.

Definitely... I hadn't even noticed that before :-)

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Chris Hostetter-3

: way to get the name at the time.  Many parts of Solr were done in an
: extreme rapid-apps type environment... I implemented it as fast as I
: could, no peer review, often past midnight, etc ;-)

Ah the good old days, when I'd send Yonik mail ~5PM Pacific requesting a
feature that i needed in my plugin, assuming he'd have replied with an
estimate of how long it would take by the time i got into work the
following morning (he's on the east coast)... only to get a suprise email
from him at midnight Pacific saying that he had a prototype ready if i
wanted to try it and was going to bed ... i'd play with it and give him
some feadback on the API and then when i'd show up at the office arround
10AM Pacific the next morning, he'd have been working on it for 3 hours
already and already be done with the damn thing.

: >  Also, it is confusing because there is a name() and getName()
: > methods required to implement a SolrCache.

It's been bugging me that so many of those "plugin" related classes don't
have in depth javadocs .. this thread has prompted me ot make a list of
hte 'biggees" on the TaskList ... it will give me something to next week
when i'm not allowed in my office building becuase they're moving
everyone.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Yonik Seeley
Something I forgot to mention are other easier alternatives to
associating user data with a particular searcher.

For example, MyData (in your case Map<FieldName,Terms>) could simply
be the single item in a solr user cache.  Another option is like how
Lucene caches FilteredQuery (basically a
WeakHashMap<SolrIndexSearcher,MyStuff>)

In either case, no regenerators or custom listeners are needed... just
configure a request to be sent to your plugin on both firstSearcher
and newSearcher events, and program your plugin to regenerate MyStuff
if it's not in the cache.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Chris Hostetter-3
In reply to this post by Erik Hatcher

I was so preoccupied with trying to understand why your cache wasn't
working, that i didnt' even register what you said about how you are using
it...

: My cache is really just a static cache of BitSet's for a fixed set of
: fields and their values.  With my current index size, creating the
: cache is incredibly fast (a second or so), but the index will grow
: much larger.

        ...

: For a fixed set of fields (currently 4 or so of them) I'm building a
: HashMap keyed by field name, with the values of each key also a
: HashMap, keyed by term value.  The value of the inner HashMap is a
: BitSet representing all documents that have that value for that
: field.  These BitSets are used for a faceted browser and ANDed
: together based on user criteria, as well as combined with full-text
: queries using QueryFilter's BitSet.  Nothing fancy, and perhaps
: something Solr already helps provide?

Solr definitely makes this easier.  All you really need to keep track of
(either in your user cache, or in hardcoded logic) is the Queries you want
to have faceting on (TermQueries grouped by field it sounds like).  if you
want to know how many docs any two facets have in common (or that
your user's query has in common with a facet) use...

    int count = searcher.numDocs(facetQ1, facetQ2);

...or if you just wnat to know the number of docs in a single facet use
searcher.getDocSet(q).size().  (there's also getDocSet(List<Query>) if you
have an arbitrary number of facets you want to intersect)

Just about all of the methods in SolrIndexSearcher will automatically
cache the that DocSet in the filterCache so that any time you do
anything involving those Queries no acctual search is done, and
the cache will be autowarmed whenever a newSearcher is opened.

If you size the filterCache big enough, and register a seed query in the
firstSearcher listener you'll never spend time waiting for any of the
facet DocSets to be calculated.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Chris Hostetter-3

I almost forgot ... if/when you want to "apply" some of those facets to a
query provided by your user, put the queries for each facet into a list
and use...

List<Query> facetToApply = ...
DocList result = searcher.getDocList(mainQuery, facetsToApply,
                                     yourSort, 0, 20, searcher.GET_SCORES)

..and the filterCache will be used for each facet, the cache DocSets will
all be intersected, and the resulting DocSet will be converted to a Filter
that will be applied when your mainQuery is executed.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Erik Hatcher
In reply to this post by Yonik Seeley
Thanks to Hoss and Yonik again(!) for their valuable assistance  
pointing me to better ways to do what I want with facets within  
Solr's infrastructure.  Very helpful.

At this point I need to pragmatically put the DocSet refactoring on  
hold to accomplish some other things, but I did get the SolrCache and  
firstSearcher event listener working using my BitSet's and will  
tackle the DocSet migration in the near future.

A couple of questions about DocSet's though, so that I'm confident  
I'll be able to get the same functionality...

Along with a BitSet for each term in selected fields, I also store a  
"catchall" BitSet that is an OR'd BitSet of all term BitSets and then  
flipped (using BitSet.or() and .flip()).  How can I flip a DocSet or  
achieve the same sort of thing?  This catchall BitSet is used to show  
"<unspecified>" on the user interface for that field, to allow  
someone to select all documents that do not have any terms in that  
field.

Also, we allow for inverted facet selection as well, allowing a user  
to select all documents that do not have a specified value.  I  
currently accomplish this in my loop to build up an aggregate  
constraint BitSet by using its .andNot() method.  How can I  
accomplish this using DocSet's?

If I can achieve these capabilities without too much effort, then my  
DocSet refactoring will happen sooner rather than later :)

Again thanks for all the help and rapid response.  Most helpful, and  
also shows that Solr is alive, vibrant, and extremely capable.

        Erik




On May 10, 2006, at 5:23 PM, Yonik Seeley wrote:

> On 5/10/06, Erik Hatcher <[hidden email]> wrote:
>> For a fixed set of fields (currently 4 or so of them) I'm building a
>> HashMap keyed by field name, with the values of each key also a
>> HashMap, keyed by term value.  The value of the inner HashMap is a
>> BitSet representing all documents that have that value for that
>> field.  These BitSets are used for a faceted browser and ANDed
>> together based on user criteria, as well as combined with full-text
>> queries using QueryFilter's BitSet.  Nothing fancy, and perhaps
>> something Solr already helps provide?
>
> Using Solr's DocSet implementations will dramatically speed up your
> faceted browsing and reduce your memory footprint.  You could store
> these DocSets yourself (and turn off the filter cache so things aren't
> doubly stored), but here is how I might go about it:
>
> In your custom cache, just store the terms for the faceting fields
> (everything but the bitsets).
> field1 -> [term1, term2, term3, term4]
> field2 -> [terma, termb, termc, termd]
>
> Then when it comes time to get the count of items matching query x,
> do
>  count1 = searcher.numDocs(x,TermQuery(term1))
>  count2 = searcher.numDocs(x,TermQuery(term2))
>  ...
>
> Solr will check the filter cache for "x" and for the TermQuery facets,
> and generate them on the fly if they are not found.
>
> What you loose:
>  - teeny bit of performance because each facet gets looked up in a
> HashMap (I've profiled... this has been negligible for us)
>
> What you gain:
> - re-use of the filtercache (including the filter for the base
> query), much faster intersections with less average memory usage &
> less garbage produced
> - an ability to easily cap the number of filters used for the facets,
> allowing a gradual reduction in performance as cache hits lower,
> rather than an OOM.


12