LinkedIn open source project: kamikaze/lucene-ext

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

LinkedIn open source project: kamikaze/lucene-ext

Lance Norskog-2
LinkedIn open-sourced a pile of DocSet compression implementations as
"Lucene-Ext", or "kamikaze":
http://code.google.com/p/lucene-ext/wiki/Kamikaze

Has anyone looked at using these in Solr?

--
Lance Norskog
[hidden email]
650-922-8831 (US)
Reply | Threaded
Open this post in threaded view
|

Re: LinkedIn open source project: kamikaze/lucene-ext

Yonik Seeley-2-2
On Sun, Mar 22, 2009 at 11:35 PM, Lance Norskog <[hidden email]> wrote:
> LinkedIn open-sourced a pile of DocSet compression implementations as
> "Lucene-Ext", or "kamikaze":
> http://code.google.com/p/lucene-ext/wiki/Kamikaze
>
> Has anyone looked at using these in Solr?

The big question would be "use for what?"

DocSets are often used for fast intersections when doing faceting....
most forms of compression would greatly impact performance.

Although due to other Lucene changes/advances, I've considered moving
from a HashDocSet to a sorted list of docids.  These DocSets could
implement skipTo() and be directly usable as filters because of that,
but would be slower for random ID lookup and slower to get the
intersection of a small and a large set.

-Yonik
http://www.lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: LinkedIn open source project: kamikaze/lucene-ext

Jason Rutherglen
In reply to this post by Lance Norskog-2
http://bobo-browse.wiki.sourceforge.net/

For faceting, the Bobo library from LinkedIn may be useful in cases where
the number of cached bitsets is excessive.

On Sun, Mar 22, 2009 at 8:35 PM, Lance Norskog <[hidden email]> wrote:

> LinkedIn open-sourced a pile of DocSet compression implementations as
> "Lucene-Ext", or "kamikaze":
> http://code.google.com/p/lucene-ext/wiki/Kamikaze
>
> Has anyone looked at using these in Solr?
>
> --
> Lance Norskog
> [hidden email]
> 650-922-8831 (US)
>
Reply | Threaded
Open this post in threaded view
|

Re: LinkedIn open source project: kamikaze/lucene-ext

Otis Gospodnetic-2

Hi,

At which point would you say the number of cached bitsets should be considered excessive?  Simply a function of bitset size (index size) and memory/JVM heap?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----

> From: Jason Rutherglen <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, March 24, 2009 2:48:25 PM
> Subject: Re: LinkedIn open source project: kamikaze/lucene-ext
>
> http://bobo-browse.wiki.sourceforge.net/
>
> For faceting, the Bobo library from LinkedIn may be useful in cases where
> the number of cached bitsets is excessive.
>
> On Sun, Mar 22, 2009 at 8:35 PM, Lance Norskog wrote:
>
> > LinkedIn open-sourced a pile of DocSet compression implementations as
> > "Lucene-Ext", or "kamikaze":
> > http://code.google.com/p/lucene-ext/wiki/Kamikaze
> >
> > Has anyone looked at using these in Solr?
> >
> > --
> > Lance Norskog
> > [hidden email]
> > 650-922-8831 (US)
> >

Reply | Threaded
Open this post in threaded view
|

Re: LinkedIn open source project: kamikaze/lucene-ext

Jason Rutherglen
A good example would be a range query field, where there are a large number
of possibilities, and caching the bit sets of the various ranges will not
fit in the allotted cache size.  Counting (faceting) over custom field
caches can be more performant, especially if  the number of facets
calculated is many.

On Wed, Mar 25, 2009 at 7:55 AM, Otis Gospodnetic <
[hidden email]> wrote:

>
> Hi,
>
> At which point would you say the number of cached bitsets should be
> considered excessive?  Simply a function of bitset size (index size) and
> memory/JVM heap?
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: Jason Rutherglen <[hidden email]>
> > To: [hidden email]
> > Sent: Tuesday, March 24, 2009 2:48:25 PM
> > Subject: Re: LinkedIn open source project: kamikaze/lucene-ext
> >
> > http://bobo-browse.wiki.sourceforge.net/
> >
> > For faceting, the Bobo library from LinkedIn may be useful in cases where
> > the number of cached bitsets is excessive.
> >
> > On Sun, Mar 22, 2009 at 8:35 PM, Lance Norskog wrote:
> >
> > > LinkedIn open-sourced a pile of DocSet compression implementations as
> > > "Lucene-Ext", or "kamikaze":
> > > http://code.google.com/p/lucene-ext/wiki/Kamikaze
> > >
> > > Has anyone looked at using these in Solr?
> > >
> > > --
> > > Lance Norskog
> > > [hidden email]
> > > 650-922-8831 (US)
> > >
>
>