request handler and caches

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Yonik Seeley
On 5/11/06, Erik Hatcher <[hidden email]> wrote:
> A couple of questions about DocSet's though, so that I'm confident
> I'll be able to get the same functionality...
>
> Along with a BitSet for each term in selected fields, I also store a
> "catchall" BitSet that is an OR'd BitSet of all term BitSets

An efficient union isn't implemented yet.  The current union() method
creates a new DocSet, and it isn't optimized for speed with
HashDocSets.

I think we'd want to either
 - create a mutatingUnion(DocSet other) to prevent repeated creation
of a new DocSet, or
 - create a union(Collection<DocSet>)
 - or create a addTo(BitSet target)

> How can I flip a DocSet or
> achieve the same sort of thing?

Currently not implemented... we either could implement it (flip on a
HashDocSet will be big though), or implement some stuff like
ChainedFilter (have a NotDocSet that wraps a DocSet).  If memory is a
concern, the latter sounds like the right way to implement that one.

> Also, we allow for inverted facet selection as well, allowing a user
> to select all documents that do not have a specified value.

So for a certain facet like "platform:pc", you also allow for "-platform:pc"?
If this is a common enough thing for faceted browsing, we should
probably build in support for that in the Solr APIs somehow (w/o
storing DocSets for both).

>  I
> currently accomplish this in my loop to build up an aggregate
> constraint BitSet by using its .andNot() method.  How can I
> accomplish this using DocSet's?

It's not there yet, but I'd be in favor of  andNot functionallity in DocSet.

> If I can achieve these capabilities without too much effort, then my
> DocSet refactoring will happen sooner rather than later :)

Looks like it might be a little later ;-)
It's great to see the requirements that others have though!

Do you facet on all terms for a particular set of fields, or are the
terms to be faceted on defined outside the system?  If the former,
most of your system would fall into what I would think of as "simple"
faceted browsing, that should be supported by default some day.  The
latter isn't too big of a leap either... maybe with the terms defined
in solrconfig.xml or something.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Erik Hatcher
On May 11, 2006, at 11:47 AM, Yonik Seeley wrote:
>> Also, we allow for inverted facet selection as well, allowing a user
>> to select all documents that do not have a specified value.
>
> So for a certain facet like "platform:pc", you also allow for "-
> platform:pc"?

Yup!  And it magically is lightening fast with the BitSet stuff I've  
implemented.  It is a handy feature in our domain (19th century  
literature).  "Show me all documents in 1870 that Dante Gabriel  
Rossetti did NOT create" - this is done completely with BitSet's when  
no full-text queries are used.  The key thing is that the facets  
return back value/counts for each of the non-zero facets (only the  
values for documents that match the constraints).

> If this is a common enough thing for faceted browsing, we should
> probably build in support for that in the Solr APIs somehow (w/o
> storing DocSets for both).

I'm not sure how common an inverted constraint is, but it certainly  
is key to my world :)

> Do you facet on all terms for a particular set of fields, or are the
> terms to be faceted on defined outside the system?  If the former,
> most of your system would fall into what I would think of as "simple"
> faceted browsing, that should be supported by default some day.  The
> latter isn't too big of a leap either... maybe with the terms defined
> in solrconfig.xml or something.

I'm afraid to let folks outside my group bang on it, but the non-Solr  
architecture (XML-RPC-based Lucene search server) is up and running  
here: http://www.nines.org/search/browse (be nice, and also note that  
it may very well go down as this is not a production-quality  
deployment).  The UI is a bit sluggish because of the fairly large  
(by HTML standards, not Lucene) number of facet values being  
rendered.  But you'll see that you can add any number of  
constraints.  Things get faster to render as the set is constrained.  
The pie charts and numbers are all dynamic based on the current  
constraints.  A constraint can be added in the negative sense by  
clicking the "-", or it can be toggled once added by clicking the "+"  
or "-" link.

The faceted fields are currently hard-coded - they require special  
indexing considerations (indexed, but not tokenized).  And the set of  
values in each field is fairly limited, but the agent (author,  
creator, artist, etc) is the most unconstrained one.  I'm looking  
forward to refactoring for DocSet's to leverage the LRU cache  
goodness for this case as our data grows.

        Erik


Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Chris Hostetter-3
In reply to this post by Yonik Seeley

: > How can I flip a DocSet or
: > achieve the same sort of thing?
:
: Currently not implemented... we either could implement it (flip on a

That's not entirely true ... DocSet does have a "getBits()" method that
can be used to either access the underlying BitSet of a DocSet, or create
a new BitSet that represents the DocSet (if the underlying implimentation
isn't already using a BitSet) ... from there you could do BitSet
operations to your hearts content, and then construct a new DocSet ... but
getBits is deprecated.

In the long run, adding all of the methods currently in the BitSet class
to the DocSet interface would be mighty nice.




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: request handler and caches

Yonik Seeley
On 5/11/06, Chris Hostetter <[hidden email]> wrote:
> DocSet does have a "getBits()" method that
> can be used to either access the underlying BitSet of a DocSet, or create
> a new BitSet that represents the DocSet (if the underlying implimentation
> isn't already using a BitSet) ... from there you could do BitSet
> operations to your hearts content, and then construct a new DocSet ... but
> getBits is deprecated.

Yeah... that reminds me of why I deprecated it - I'd like to replace
BitSet with my faster implementation that Iv'e been twiddling with
slowly on my own time.  Maybe now is the right time before more people
start using Solr in production...  I'll refresh my memory and see what
else needs to be done.

-Yonik
12