is OpenBitSet / SortedVIntList compressed bit map index?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

is OpenBitSet / SortedVIntList compressed bit map index?

First Last
Hi,

is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
better if memory usage is the primary concern ?

Our filters are sparse. So is SortedVIntList better in that case?

Are there any other compressed bitmap index implementations which offer bit
map compression at a decent performance assuming filters are sparse?

I'd appreciate any help on this.Thanks.

- Raavan
Reply | Threaded
Open this post in threaded view
|

Re: is OpenBitSet / SortedVIntList compressed bit map index?

Federico Fissore
First Last, il 07/01/2011 20:55, ha scritto:
> Hi,
>
> is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
> better if memory usage is the primary concern ?
>

SortedVIntList is compressed, OpenBitSet is not


> Our filters are sparse. So is SortedVIntList better in that case?
>

Yes


> Are there any other compressed bitmap index implementations which offer bit
> map compression at a decent performance assuming filters are sparse?
>

I'm too looking for alternative implementations of compressed bitsets,
so I'm too really interested in everybody experience: my primary concern
at the moment is serializing bitsets to recover searcher warmup time

I've tried some and roughly tested them: my conclusion was that we
(lucene users) already stand on the rolls royce of bitset implementations.

federico

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: is OpenBitSet / SortedVIntList compressed bit map index?

Ryan Aylward
I don't recall how we decided to use it, but we are using http://code.google.com/p/compressedbitset/ and it seems to be pretty efficient in terms of memory.

-----Original Message-----
From: Federico Fissore [mailto:[hidden email]]
Sent: Friday, January 07, 2011 3:12 PM
To: [hidden email]
Subject: Re: is OpenBitSet / SortedVIntList compressed bit map index?

First Last, il 07/01/2011 20:55, ha scritto:
> Hi,
>
> is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
> better if memory usage is the primary concern ?
>

SortedVIntList is compressed, OpenBitSet is not


> Our filters are sparse. So is SortedVIntList better in that case?
>

Yes


> Are there any other compressed bitmap index implementations which offer bit
> map compression at a decent performance assuming filters are sparse?
>

I'm too looking for alternative implementations of compressed bitsets,
so I'm too really interested in everybody experience: my primary concern
at the moment is serializing bitsets to recover searcher warmup time

I've tried some and roughly tested them: my conclusion was that we
(lucene users) already stand on the rolls royce of bitset implementations.

federico

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: is OpenBitSet / SortedVIntList compressed bit map index?

First Last
In reply to this post by Federico Fissore
Thanks Federico.

>> my primary concern at the moment is serializing bitsets to recover
searcher warmup time

I am also considering doing the same to reduce warmup time during restarts.

It seems one of the disadvantages of SortedVIntList is the performance
skipTo() as per Paul Elschot since it does not support random access like
OpenBitSet.
https://issues.apache.org/jira/browse/LUCENE-1296

Our primary concern is memory usage since we have hundreds of filters and
large number of documents. So if the performance is decent, I am thinking of
using SortedVIntList for all our sparse filters.

-Raavan

On Fri, Jan 7, 2011 at 3:11 PM, Federico Fissore <[hidden email]>wrote:

> First Last, il 07/01/2011 20:55, ha scritto:
>
>  Hi,
>>
>> is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
>> better if memory usage is the primary concern ?
>>
>>
> SortedVIntList is compressed, OpenBitSet is not
>
>
>
>  Our filters are sparse. So is SortedVIntList better in that case?
>>
>>
> Yes
>
>
>
>  Are there any other compressed bitmap index implementations which offer
>> bit
>> map compression at a decent performance assuming filters are sparse?
>>
>>
> I'm too looking for alternative implementations of compressed bitsets, so
> I'm too really interested in everybody experience: my primary concern at the
> moment is serializing bitsets to recover searcher warmup time
>
> I've tried some and roughly tested them: my conclusion was that we (lucene
> users) already stand on the rolls royce of bitset implementations.
>
> federico
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: is OpenBitSet / SortedVIntList compressed bit map index?

First Last
In reply to this post by Ryan Aylward
Thanks Ryan. I will test this to see if it uses much less memory than
SortedVIntList.

-Raavan

On Fri, Jan 7, 2011 at 4:05 PM, Ryan Aylward <[hidden email]> wrote:

> I don't recall how we decided to use it, but we are using
> http://code.google.com/p/compressedbitset/ and it seems to be pretty
> efficient in terms of memory.
>
> -----Original Message-----
> From: Federico Fissore [mailto:[hidden email]]
> Sent: Friday, January 07, 2011 3:12 PM
> To: [hidden email]
> Subject: Re: is OpenBitSet / SortedVIntList compressed bit map index?
>
> First Last, il 07/01/2011 20:55, ha scritto:
> > Hi,
> >
> > is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
> > better if memory usage is the primary concern ?
> >
>
> SortedVIntList is compressed, OpenBitSet is not
>
>
> > Our filters are sparse. So is SortedVIntList better in that case?
> >
>
> Yes
>
>
> > Are there any other compressed bitmap index implementations which offer
> bit
> > map compression at a decent performance assuming filters are sparse?
> >
>
> I'm too looking for alternative implementations of compressed bitsets,
> so I'm too really interested in everybody experience: my primary concern
> at the moment is serializing bitsets to recover searcher warmup time
>
> I've tried some and roughly tested them: my conclusion was that we
> (lucene users) already stand on the rolls royce of bitset implementations.
>
> federico
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: is OpenBitSet / SortedVIntList compressed bit map index?

First Last
In reply to this post by First Last
Also, just for my understanding, is SortedVIntList able to perform some
operations such as AND/OR without decompression ?

Some of the algorithms mentioned below claim to do that. But I understand
that there are patent issues surrounding these algorithms.
http://en.wikipedia.org/wiki/Bitmap_index

-Raavan

On Sat, Jan 8, 2011 at 10:14 AM, Raavan <[hidden email]> wrote:

> Thanks Federico.
>
> >> my primary concern at the moment is serializing bitsets to recover
> searcher warmup time
>
> I am also considering doing the same to reduce warmup time during restarts.
>
> It seems one of the disadvantages of SortedVIntList is the performance
> skipTo() as per Paul Elschot since it does not support random access like
> OpenBitSet.
> https://issues.apache.org/jira/browse/LUCENE-1296
>
> Our primary concern is memory usage since we have hundreds of filters and
> large number of documents. So if the performance is decent, I am thinking of
> using SortedVIntList for all our sparse filters.
>
> -Raavan
>
>
> On Fri, Jan 7, 2011 at 3:11 PM, Federico Fissore <[hidden email]>wrote:
>
>> First Last, il 07/01/2011 20:55, ha scritto:
>>
>>  Hi,
>>>
>>> is OpenBitSet / SortedVIntList a compressed bit map index? Which one is
>>> better if memory usage is the primary concern ?
>>>
>>>
>> SortedVIntList is compressed, OpenBitSet is not
>>
>>
>>
>>  Our filters are sparse. So is SortedVIntList better in that case?
>>>
>>>
>> Yes
>>
>>
>>
>>  Are there any other compressed bitmap index implementations which offer
>>> bit
>>> map compression at a decent performance assuming filters are sparse?
>>>
>>>
>> I'm too looking for alternative implementations of compressed bitsets, so
>> I'm too really interested in everybody experience: my primary concern at the
>> moment is serializing bitsets to recover searcher warmup time
>>
>> I've tried some and roughly tested them: my conclusion was that we (lucene
>> users) already stand on the rolls royce of bitset implementations.
>>
>> federico
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: is OpenBitSet / SortedVIntList compressed bit map index?

Raffaella Ventaglio
On Sat, Jan 8, 2011 at 7:24 PM, Raavan <[hidden email]> wrote:

> Also, just for my understanding, is SortedVIntList able to perform some
> operations such as AND/OR without decompression ?
>

No, not natively:
http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/util/SortedVIntList.html

But the *iterator* returns the *docIds* by order and there is a *constructor
*that accepts a *DocIdSetIterator*.

So it's quite easy to implement the AND/OR operations by creating a *
DocIdSetIterator *that receives the two iterators from the original *
SortedVIntLists* and scans them in parallel, implementing *nextDoc* and *
advance* methods accordingly to AND/OR semantic.

We use something like that and, for very sparse bitsets, it is more
efficient than to convert them in *OpenBitSets* in order to perform AND/OR
operations.

Bye
*Raf*
Reply | Threaded
Open this post in threaded view
|

Re: is OpenBitSet / SortedVIntList compressed bit map index?

First Last
Thanks Raf.

On Sun, Jan 9, 2011 at 1:20 AM, Raf <[hidden email]> wrote:

> On Sat, Jan 8, 2011 at 7:24 PM, Raavan <[hidden email]> wrote:
>
> > Also, just for my understanding, is SortedVIntList able to perform some
> > operations such as AND/OR without decompression ?
> >
>
> No, not natively:
>
> http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/util/SortedVIntList.html
>
> But the *iterator* returns the *docIds* by order and there is a
> *constructor
> *that accepts a *DocIdSetIterator*.
>
> So it's quite easy to implement the AND/OR operations by creating a *
> DocIdSetIterator *that receives the two iterators from the original *
> SortedVIntLists* and scans them in parallel, implementing *nextDoc* and *
> advance* methods accordingly to AND/OR semantic.
>
> We use something like that and, for very sparse bitsets, it is more
> efficient than to convert them in *OpenBitSets* in order to perform AND/OR
> operations.
>
> Bye
> *Raf*
>
Reply | Threaded
Open this post in threaded view
|

Re: is OpenBitSet / SortedVIntList compressed bit map index?

Federico Fissore
In reply to this post by First Last
Raavan, il 08/01/2011 19:17, ha scritto:
> Thanks Ryan. I will test this to see if it uses much less memory than
> SortedVIntList.
>
> -Raavan


Hello Raavan,

have you tested those implementations? How do they look?

federico

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: is OpenBitSet / SortedVIntList compressed bit map index?

ai114
In reply to this post by First Last
First Last wrote
Are there any other compressed bitmap index implementations which offer bit
map compression at a decent performance assuming filters are sparse?
Have a look at  EWAH by Daniel Lemire
google:http://code.google.com/p/javaewah/
research paper: http://arxiv.org/abs/0901.3751
code: https://github.com/lemire/javaewah/tree/

Gabriel