sorting in SolrIndexSearcher

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

sorting in SolrIndexSearcher

Peter Keegan
I'm looking at the following code from SolrIndexSearcher.getDocListNC:

      final FieldSortedHitQueue hq = new FieldSortedHitQueue(reader,
lsort.getSort(), offset+len);
      searcher.search(query, new HitCollector() {
        public void collect(int doc, float score) {
          if (filt!=null && !filt.exists(doc)) return;
          numHits[0]++;
          hq.insert(new FieldDoc(doc, score));
        }
      }
      );
      totalHits = numHits[0];
      maxScore = totalHits>0 ? hq.getMaxScore() : 0.0f;
      nDocsReturned = hq.size();
      ids = new int[nDocsReturned];
      scores = (flags&GET_SCORES)!=0 ? new float[nDocsReturned] : null;
      for (int i = nDocsReturned -1; i >= 0; i--) {
        FieldDoc fieldDoc = (FieldDoc)hq.pop();
        // fillFields is the point where score normalization happens
        // hq.fillFields(fieldDoc)
        ids[i] = fieldDoc.doc;
        if (scores != null) scores[i] = fieldDoc.score;
      }

Why are the document IDs and scores being retrieved from the
PriorityQueue in reverse order? I'm missing something obvious.

Thanks,
Peter
Reply | Threaded
Open this post in threaded view
|

Re: sorting in SolrIndexSearcher

Yonik Seeley-2
On 10/20/06, Peter Keegan <[hidden email]> wrote:

> I'm looking at the following code from SolrIndexSearcher.getDocListNC:
>
>       final FieldSortedHitQueue hq = new FieldSortedHitQueue(reader,
> lsort.getSort(), offset+len);
>       searcher.search(query, new HitCollector() {
>         public void collect(int doc, float score) {
>           if (filt!=null && !filt.exists(doc)) return;
>           numHits[0]++;
>           hq.insert(new FieldDoc(doc, score));
>         }
>       }
>       );
>       totalHits = numHits[0];
>       maxScore = totalHits>0 ? hq.getMaxScore() : 0.0f;
>       nDocsReturned = hq.size();
>       ids = new int[nDocsReturned];
>       scores = (flags&GET_SCORES)!=0 ? new float[nDocsReturned] : null;
>       for (int i = nDocsReturned -1; i >= 0; i--) {
>         FieldDoc fieldDoc = (FieldDoc)hq.pop();
>         // fillFields is the point where score normalization happens
>         // hq.fillFields(fieldDoc)
>         ids[i] = fieldDoc.doc;
>         if (scores != null) scores[i] = fieldDoc.score;
>       }
>
> Why are the document IDs and scores being retrieved from the
> PriorityQueue in reverse order? I'm missing something obvious.

The PriorityQueue allows you to find the *smallest* element in it in
log(N) time, not the largest, so we need to retrieve smallest to
largest.  But since we want the highest score first, we traverse the
array in reverse order, putting the smallest in the last position and
the largest in the first.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: sorting in SolrIndexSearcher

Peter Keegan
Aha. I thought Solr was doing things differently than Lucene, but now I see
the same thing in TopFieldDocCollector. Thanks Yonik.

Peter


On 10/20/06, Yonik Seeley <[hidden email]> wrote:

>
> On 10/20/06, Peter Keegan <[hidden email]> wrote:
> > I'm looking at the following code from SolrIndexSearcher.getDocListNC:
> >
> >       final FieldSortedHitQueue hq = new FieldSortedHitQueue(reader,
> > lsort.getSort(), offset+len);
> >       searcher.search(query, new HitCollector() {
> >         public void collect(int doc, float score) {
> >           if (filt!=null && !filt.exists(doc)) return;
> >           numHits[0]++;
> >           hq.insert(new FieldDoc(doc, score));
> >         }
> >       }
> >       );
> >       totalHits = numHits[0];
> >       maxScore = totalHits>0 ? hq.getMaxScore() : 0.0f;
> >       nDocsReturned = hq.size();
> >       ids = new int[nDocsReturned];
> >       scores = (flags&GET_SCORES)!=0 ? new float[nDocsReturned] : null;
> >       for (int i = nDocsReturned -1; i >= 0; i--) {
> >         FieldDoc fieldDoc = (FieldDoc)hq.pop();
> >         // fillFields is the point where score normalization happens
> >         // hq.fillFields(fieldDoc)
> >         ids[i] = fieldDoc.doc;
> >         if (scores != null) scores[i] = fieldDoc.score;
> >       }
> >
> > Why are the document IDs and scores being retrieved from the
> > PriorityQueue in reverse order? I'm missing something obvious.
>
> The PriorityQueue allows you to find the *smallest* element in it in
> log(N) time, not the largest, so we need to retrieve smallest to
> largest.  But since we want the highest score first, we traverse the
> array in reverse order, putting the smallest in the last position and
> the largest in the first.
>
> -Yonik
>