searchWithFilter bug?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

searchWithFilter bug?

Peter Keegan
I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter
wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I
get only a subset of the expected results, even accounting for deletes. The
index has 10 segments. In IndexSearcher->searchWithFilter, it looks like the
scorer is advancing to the filter's docId, which is the index-wide value,
but the scorer is using the segment-relative value. If I optimize the index,
I get the expected results.
Does this look like a bug?

Peter
Reply | Threaded
Open this post in threaded view
|

Re: searchWithFilter bug?

Michael McCandless-2
That doesn't sound good.

Though, in searchWithFilter, we seem to ask for the Query's scorer,
and the Filter's docIdSetIterator, using the same reader (which may be
toplevel, for the legacy case, or per-segment, for the normal case).
So I'm not [yet] seeing where the issue is...

Can you boil it down to a smallish test case?

Mike

On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan <[hidden email]> wrote:

> I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter
> wraps a simple BitSet. When doing a 'MatchAllDocs' query with this filter, I
> get only a subset of the expected results, even accounting for deletes. The
> index has 10 segments. In IndexSearcher->searchWithFilter, it looks like the
> scorer is advancing to the filter's docId, which is the index-wide value,
> but the scorer is using the segment-relative value. If I optimize the index,
> I get the expected results.
> Does this look like a bug?
>
> Peter
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: searchWithFilter bug?

Peter Keegan
I think the Filter's docIdSetIterator is using the top level reader for each
segment, because the cardinality of the DocIdSet from which it's created is
the same for all readers (and what I expect to see at the top level.

Peter

On Fri, Dec 4, 2009 at 10:38 AM, Michael McCandless <
[hidden email]> wrote:

> That doesn't sound good.
>
> Though, in searchWithFilter, we seem to ask for the Query's scorer,
> and the Filter's docIdSetIterator, using the same reader (which may be
> toplevel, for the legacy case, or per-segment, for the normal case).
> So I'm not [yet] seeing where the issue is...
>
> Can you boil it down to a smallish test case?
>
> Mike
>
> On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan <[hidden email]>
> wrote:
> > I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter
> > wraps a simple BitSet. When doing a 'MatchAllDocs' query with this
> filter, I
> > get only a subset of the expected results, even accounting for deletes.
> The
> > index has 10 segments. In IndexSearcher->searchWithFilter, it looks like
> the
> > scorer is advancing to the filter's docId, which is the index-wide value,
> > but the scorer is using the segment-relative value. If I optimize the
> index,
> > I get the expected results.
> > Does this look like a bug?
> >
> > Peter
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: searchWithFilter bug?

Simon Willnauer
Peter, which filter do you use, do you respect the IndexReaders
maxDoc() and the docBase?

simon

On Fri, Dec 4, 2009 at 4:47 PM, Peter Keegan <[hidden email]> wrote:

> I think the Filter's docIdSetIterator is using the top level reader for each
> segment, because the cardinality of the DocIdSet from which it's created is
> the same for all readers (and what I expect to see at the top level.
>
> Peter
>
> On Fri, Dec 4, 2009 at 10:38 AM, Michael McCandless <
> [hidden email]> wrote:
>
>> That doesn't sound good.
>>
>> Though, in searchWithFilter, we seem to ask for the Query's scorer,
>> and the Filter's docIdSetIterator, using the same reader (which may be
>> toplevel, for the legacy case, or per-segment, for the normal case).
>> So I'm not [yet] seeing where the issue is...
>>
>> Can you boil it down to a smallish test case?
>>
>> Mike
>>
>> On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan <[hidden email]>
>> wrote:
>> > I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The Filter
>> > wraps a simple BitSet. When doing a 'MatchAllDocs' query with this
>> filter, I
>> > get only a subset of the expected results, even accounting for deletes.
>> The
>> > index has 10 segments. In IndexSearcher->searchWithFilter, it looks like
>> the
>> > scorer is advancing to the filter's docId, which is the index-wide value,
>> > but the scorer is using the segment-relative value. If I optimize the
>> index,
>> > I get the expected results.
>> > Does this look like a bug?
>> >
>> > Peter
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: searchWithFilter bug?

Peter Keegan
The filter is just a java.util.BitSet. I use the top level reader to create
the filter, and call IndexSearcher.search (Query, Filter, HitCollector). So,
there is no 'docBase' at this level of the api.

Peter

On Fri, Dec 4, 2009 at 11:01 AM, Simon Willnauer <
[hidden email]> wrote:

> Peter, which filter do you use, do you respect the IndexReaders
> maxDoc() and the docBase?
>
> simon
>
> On Fri, Dec 4, 2009 at 4:47 PM, Peter Keegan <[hidden email]>
> wrote:
> > I think the Filter's docIdSetIterator is using the top level reader for
> each
> > segment, because the cardinality of the DocIdSet from which it's created
> is
> > the same for all readers (and what I expect to see at the top level.
> >
> > Peter
> >
> > On Fri, Dec 4, 2009 at 10:38 AM, Michael McCandless <
> > [hidden email]> wrote:
> >
> >> That doesn't sound good.
> >>
> >> Though, in searchWithFilter, we seem to ask for the Query's scorer,
> >> and the Filter's docIdSetIterator, using the same reader (which may be
> >> toplevel, for the legacy case, or per-segment, for the normal case).
> >> So I'm not [yet] seeing where the issue is...
> >>
> >> Can you boil it down to a smallish test case?
> >>
> >> Mike
> >>
> >> On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan <[hidden email]>
> >> wrote:
> >> > I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The
> Filter
> >> > wraps a simple BitSet. When doing a 'MatchAllDocs' query with this
> >> filter, I
> >> > get only a subset of the expected results, even accounting for
> deletes.
> >> The
> >> > index has 10 segments. In IndexSearcher->searchWithFilter, it looks
> like
> >> the
> >> > scorer is advancing to the filter's docId, which is the index-wide
> value,
> >> > but the scorer is using the segment-relative value. If I optimize the
> >> index,
> >> > I get the expected results.
> >> > Does this look like a bug?
> >> >
> >> > Peter
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: searchWithFilter bug?

Simon Willnauer
---------- Forwarded message ----------
From: Simon Willnauer <[hidden email]>
Date: Fri, Dec 4, 2009 at 6:53 PM
Subject: Re: searchWithFilter bug?
To: Peter Keegan <[hidden email]>


Peter, since search is per segment you need to use the segment reader
passed in during search to create you DocIdSet if you use absolute
docID your filter will not work.
Many filters don't need to be segment aware as they use the given
reader to somehow generate the docIdSet like
MultiTermQueryWrapperFiler. DistanceFilter (contrib/spatial) and its
subclasses keep state internally to work with per-segment search.

maybe this helps to understand:

 public static final class SimpleDocIdSetFilter extends Filter {
   private int docBase;
   private int[] docs;
   private int index;
   public SimpleDocIdSetFilter(int[] docs) {
     this.docs = docs;
   }
   @Override
   public DocIdSet getDocIdSet(IndexReader reader) {
     final OpenBitSet set = new OpenBitSet();
     final int limit = docBase+reader.maxDoc();
     for (;index < docs.length; index++) {
       final int docId = docs[index];
       if(docId > limit)
         break;
       set.set(docId-docBase);
     }
     docBase = limit;
     return set.isEmpty()?null:set;
   }
 }

@Mike: maybe we should add a testcase / method in TestFilteredSearch
that searches on more than one segment.

simon


On Fri, Dec 4, 2009 at 5:27 PM, Peter Keegan <[hidden email]> wrote:

> The filter is just a java.util.BitSet. I use the top level reader to create
> the filter, and call IndexSearcher.search (Query, Filter, HitCollector). So,
> there is no 'docBase' at this level of the api.
>
> Peter
>
> On Fri, Dec 4, 2009 at 11:01 AM, Simon Willnauer
> <[hidden email]> wrote:
>>
>> Peter, which filter do you use, do you respect the IndexReaders
>> maxDoc() and the docBase?
>>
>> simon
>>
>> On Fri, Dec 4, 2009 at 4:47 PM, Peter Keegan <[hidden email]>
>> wrote:
>> > I think the Filter's docIdSetIterator is using the top level reader for
>> > each
>> > segment, because the cardinality of the DocIdSet from which it's created
>> > is
>> > the same for all readers (and what I expect to see at the top level.
>> >
>> > Peter
>> >
>> > On Fri, Dec 4, 2009 at 10:38 AM, Michael McCandless <
>> > [hidden email]> wrote:
>> >
>> >> That doesn't sound good.
>> >>
>> >> Though, in searchWithFilter, we seem to ask for the Query's scorer,
>> >> and the Filter's docIdSetIterator, using the same reader (which may be
>> >> toplevel, for the legacy case, or per-segment, for the normal case).
>> >> So I'm not [yet] seeing where the issue is...
>> >>
>> >> Can you boil it down to a smallish test case?
>> >>
>> >> Mike
>> >>
>> >> On Fri, Dec 4, 2009 at 10:32 AM, Peter Keegan <[hidden email]>
>> >> wrote:
>> >> > I'm having a problem with 'searchWithFilter' on Lucene 2.9.1. The
>> >> > Filter
>> >> > wraps a simple BitSet. When doing a 'MatchAllDocs' query with this
>> >> filter, I
>> >> > get only a subset of the expected results, even accounting for
>> >> > deletes.
>> >> The
>> >> > index has 10 segments. In IndexSearcher->searchWithFilter, it looks
>> >> > like
>> >> the
>> >> > scorer is advancing to the filter's docId, which is the index-wide
>> >> > value,
>> >> > but the scorer is using the segment-relative value. If I optimize the
>> >> index,
>> >> > I get the expected results.
>> >> > Does this look like a bug?
>> >> >
>> >> > Peter
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: searchWithFilter bug?

Michael McCandless-2
On Fri, Dec 4, 2009 at 12:53 PM, Simon Willnauer
<[hidden email]> wrote:

> @Mike: maybe we should add a testcase / method in TestFilteredSearch
> that searches on more than one segment.

I agree, we should -- wanna cough up a patch?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: searchWithFilter bug?

Simon Willnauer
On Fri, Dec 4, 2009 at 7:09 PM, Michael McCandless
<[hidden email]> wrote:
> On Fri, Dec 4, 2009 at 12:53 PM, Simon Willnauer
> <[hidden email]> wrote:
>
>> @Mike: maybe we should add a testcase / method in TestFilteredSearch
>> that searches on more than one segment.
>
Working on it... will open an issue in a bit.
> I agree, we should -- wanna cough up a patch?
>
> Mike
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]