[jira] [Commented] (LUCENE-2482) Index sorter

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (LUCENE-2482) Index sorter

Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199201#comment-13199201 ]

Pablo Castellanos commented on LUCENE-2482:

Hi, I wanted to implement some early termination strategies over my Lucene index so I started playing with the 4.0 patch as I need to reorder it.

So I have found that a lot of functions have changed in the past year and I had to go for some modifications, mainly:

public TermFreqVector[] getTermFreqVectors(int docNumber)
        throws IOException {
  return super.getTermFreqVectors(newToOld[docNumber]);

public Fields getTermVectors(int docID) throws IOException {
return super.getTermVectors(newToOld[docID]);

public Document document(int n, FieldSelector fieldSelector)
        throws CorruptIndexException, IOException {
  return super.document(newToOld[n], fieldSelector);

public void document(int docID, StoredFieldVisitor visitor)
throws CorruptIndexException, IOException {
super.document(newToOld[docID], visitor);

There exists also a getDeletedDocs function and I haven't found any good replacement for it

    public Bits getDeletedDocs() {
      final Bits deletedDocs = super.getDeletedDocs();

      if (deletedDocs == null)
        return null;

      return new Bits() {
        public boolean get(int index) {
          return deletedDocs.get(newToOld[index]);

        public int length() {
          return deletedDocs.length();

After applying these changes and using the code against my lucene index I get some weird results. It seems that the new sorting has worked but the posting list that access to the documents is still pointing to the old data.

Imagine that I have 2 documents in my index and that I want to sort them by price (So the most expensive item should have a lower docId)

Document 1
{panel}docId:1, name: iPod, price: 100${panel}

Document 2
{panel}docId:2, name: iPhone, price: 300${panel}

I run my modified version of IndexSorter over it and after that I try to query the new index, so if I query for _name:iPhone_ I get:
{panel}docId:2, name: iPod, price: 100${panel}

That leads me to believe that the documents have been sorted but the new index is using the old posting list.

So I have two questions, are you planning on updating this code for newer versions of Lucene 4.0 or am I on my own to get it to work? And if this is the case, where should I look for getting a solution for my problem?

Thanks in advance for your help.

> Index sorter
> ------------
>                 Key: LUCENE-2482
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2482
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/other
>    Affects Versions: 3.1, 4.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 3.6, 4.0
>         Attachments: LUCENE-2482-4.0.patch, indexSorter.patch
> A tool to sort index according to a float document weight. Documents with high weight are given low document numbers, which means that they will be first evaluated. When using a strategy of "early termination" of queries (see TimeLimitedCollector) such sorting significantly improves the quality of partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as document weights - thus the ordering was limited by the limited resolution of norms. This is a pure Lucene version of the tool, and it uses arbitrary floats from a specified stored field).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]