BitDocSet (BitSet->OpenBitSet)

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

BitDocSet (BitSet->OpenBitSet)

Yonik Seeley
Are there any objections to replacing BitSet with OpenBitSet in BitDocSet?
http://issues.apache.org/jira/browse/SOLR-15

This would only impact those using custom query handlers *and*
constructing their own BitSet objects or using the deprecated
DocSet.getBits() method.  Code changes should be simple & minimal
though, as I think I've implemented everything needed for faceted
browsing and the method names should be the same as those on BitSet.
The benefit is the 3 to 4 times speedup for intersection-size
calculations, plus the flexibility of being able to do more efficient
stuff in the future (for example, think chained-filter
word-at-a-time).

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: BitDocSet (BitSet->OpenBitSet)

Peter Keegan
No objections - please do.

On 6/8/06, Yonik Seeley <[hidden email]> wrote:

>
> Are there any objections to replacing BitSet with OpenBitSet in BitDocSet?
> http://issues.apache.org/jira/browse/SOLR-15
>
> This would only impact those using custom query handlers *and*
> constructing their own BitSet objects or using the deprecated
> DocSet.getBits() method.  Code changes should be simple & minimal
> though, as I think I've implemented everything needed for faceted
> browsing and the method names should be the same as those on BitSet.
> The benefit is the 3 to 4 times speedup for intersection-size
> calculations, plus the flexibility of being able to do more efficient
> stuff in the future (for example, think chained-filter
> word-at-a-time).
>
> -Yonik
>
Reply | Threaded
Open this post in threaded view
|

Re: BitDocSet (BitSet->OpenBitSet)

Erik Hatcher
In reply to this post by Yonik Seeley
Go for it!

        Erik


On Jun 8, 2006, at 3:15 PM, Yonik Seeley wrote:

> Are there any objections to replacing BitSet with OpenBitSet in  
> BitDocSet?
> http://issues.apache.org/jira/browse/SOLR-15
>
> This would only impact those using custom query handlers *and*
> constructing their own BitSet objects or using the deprecated
> DocSet.getBits() method.  Code changes should be simple & minimal
> though, as I think I've implemented everything needed for faceted
> browsing and the method names should be the same as those on BitSet.
> The benefit is the 3 to 4 times speedup for intersection-size
> calculations, plus the flexibility of being able to do more efficient
> stuff in the future (for example, think chained-filter
> word-at-a-time).
>
> -Yonik

Reply | Threaded
Open this post in threaded view
|

Re: BitDocSet (BitSet->OpenBitSet)

eks dev
Yonik, Erik...
What would be the right way to put ObenBitSet into Lucene?

Should it be copied somewhere into Lucene packages (in order not to create dependancy on solr).

I would vote to park it in Lucene util package or so and then reference it from solr as this dependancy allready exists?



----- Original Message ----
From: Erik Hatcher <[hidden email]>
To: [hidden email]
Sent: Friday, 9 June, 2006 3:22:13 AM
Subject: Re: BitDocSet (BitSet->OpenBitSet)

Go for it!

    Erik


On Jun 8, 2006, at 3:15 PM, Yonik Seeley wrote:

> Are there any objections to replacing BitSet with OpenBitSet in  
> BitDocSet?
> http://issues.apache.org/jira/browse/SOLR-15
>
> This would only impact those using custom query handlers *and*
> constructing their own BitSet objects or using the deprecated
> DocSet.getBits() method.  Code changes should be simple & minimal
> though, as I think I've implemented everything needed for faceted
> browsing and the method names should be the same as those on BitSet.
> The benefit is the 3 to 4 times speedup for intersection-size
> calculations, plus the flexibility of being able to do more efficient
> stuff in the future (for example, think chained-filter
> word-at-a-time).
>
> -Yonik




Reply | Threaded
Open this post in threaded view
|

Re: BitDocSet (BitSet->OpenBitSet)

Chris Hostetter-3
In reply to this post by Yonik Seeley
: Are there any objections to replacing BitSet with OpenBitSet in BitDocSet?
: http://issues.apache.org/jira/browse/SOLR-15

+1

... but I'd feel better about it if there were static utility methods
for converting from BitSet<=>OpenBitSet so that BitDocSet could continue
to have deprecated constructors that take in a BitSet (and could continue
to impliment getBits():BitSet)

Those should be fairly straight forward right? ...

  public static BitSet convert(final OpenBitSet in) {
    final BitSet out = new BitSet(in.length());
    for(int i=in.nextSetBit(0); i>=0; i=in.nextSetBit(i+1)) {
      out.set(i);
    }
    return out;
  }

(were you planing on changing the declaration of DocSet.getBits() to
return OpenBitSet?)

My only other question about the code in SOLR-15 was wether or not some of
those low level utilities like "pop_xor(long[],long[],int,int):long"
should be protected/private?


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: BitDocSet (BitSet->OpenBitSet)

Chris Hostetter-3
In reply to this post by eks dev

: Yonik, Erik...
: What would be the right way to put ObenBitSet into Lucene?

I think the right thing to do is put it in Solr since there's a
demonstrated use for it there.

if/when it makes sense to "promote" it into Lucene core we can do that --
but at the moment there really isn't ... the only place BitSets are used
in Lucene is in Filters, and the current push is to remove that API usage
favor of an iterator style approach.  At that point, it may make sense for
people to impliment Filters under the covers with OpenBitSets .. but
untill that time comes, Solr seems like the place to put it.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: BitDocSet (BitSet->OpenBitSet)

Yonik Seeley
In reply to this post by Chris Hostetter-3
On 6/9/06, Chris Hostetter <[hidden email]> wrote:
> : Are there any objections to replacing BitSet with OpenBitSet in BitDocSet?
> : http://issues.apache.org/jira/browse/SOLR-15
>
> +1
>
> ... but I'd feel better about it if there were static utility methods
> for converting from BitSet<=>OpenBitSet

That might be OK.

> so that BitDocSet could continue
> to have deprecated constructors that take in a BitSet (and could continue
> to impliment getBits():BitSet)
> Those should be fairly straight forward right? ...

I had considered it, but I think the dangers might outweigh the benefits.
I'd almost rather have someones code break and have an easy way for
them to fix it rather than silently slow it down by an order of
magnitude.

> (were you planing on changing the declaration of DocSet.getBits() to
> return OpenBitSet?)

Yes, I think so.

> My only other question about the code in SOLR-15 was wether or not some of
> those low level utilities like "pop_xor(long[],long[],int,int):long"
> should be protected/private?

It's a library developers class... too many times I have wanted to do
something as efficiently as something in the Java standard library,
only to find out that the needed methods are package protected.  In
this case, the downside is that it messes up the JavaDoc a bit though.
 Still, most users shouldn't be even looking at this class right?


-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: BitDocSet (BitSet->OpenBitSet)

Yonik Seeley
In reply to this post by Chris Hostetter-3
On 6/9/06, Chris Hostetter <[hidden email]> wrote:
> if/when it makes sense to "promote" it into Lucene core we can do that --

Yeah. Promotion to lucene isn't completely backward compatible for
Solr though (unless we keep a copy both places).  That's why I had
originally kept a local lucene package in Solr... to prevent name
changes in the event something was moved to lucene.

It's probably not too big of a deal if we haven't made a release yet,
and this is more in the "expert use" realm anyway.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: BitDocSet (BitSet->OpenBitSet)

Chris Hostetter-3
In reply to this post by Yonik Seeley

: > so that BitDocSet could continue
: > to have deprecated constructors that take in a BitSet (and could continue
: > to impliment getBits():BitSet)
: > Those should be fairly straight forward right? ...
:
: I had considered it, but I think the dangers might outweigh the benefits.
: I'd almost rather have someones code break and have an easy way for
: them to fix it rather than silently slow it down by an order of
: magnitude.
:
: > (were you planing on changing the declaration of DocSet.getBits() to
: > return OpenBitSet?)
:
: Yes, I think so.

Hmmm ... yeah, I see your point about it being better to break the code in
a way that's easy to fix then have a super slow translation that might
take place a lot.

ok... I'm on board with changing getBits and the constructor for BitDocSet
to replace all BitSet refrneces with OpenBitSet -- but i'd still like to
add those static utilities so people who might have a specific reason to
keep using BitSets will have a way to do it.

: It's a library developers class... too many times I have wanted to do
: something as efficiently as something in the Java standard library,
: only to find out that the needed methods are package protected.  In
: this case, the downside is that it messes up the JavaDoc a bit though.
:  Still, most users shouldn't be even looking at this class right?

they can be protected without being package protected ... or they could be
public in an even lower level class (a static inner class maybe?) .. but
my main motivation for mentioning it was to try and reduce hte amount of
confusion when people look at the BitSet API compared to the OpenBitSet
API ... we want them to look as similar as possible right?



-Hoss