[jira] [Created] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange

JIRA jira@apache.org
MultiSearcher does not work correctly with Not on NumericRange
--------------------------------------------------------------

                 Key: LUCENE-3096
                 URL: https://issues.apache.org/jira/browse/LUCENE-3096
             Project: Lucene - Java
          Issue Type: Bug
          Components: Search
    Affects Versions: 3.0.2
            Reporter: John Wang


Hi, Keith

My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular,

If we search with Not on NumericRange and we use MultiSearcher, we
will wrong search results (However, if we use IndexSearcher, the
result is correct).  Basically the NotOfNumericRange does not have
impact on multisearcher. We suspect it is because the createWeight()
function in MultiSearcher and hope you can help us to fix this bug of
lucene. I attached the code to reproduce this case. Please check it
out.

In the attached code, I have two separate functions :

(1) testNumericRangeSingleSearcher(Query query)
    where I create 6 documents, with a field called "id"= 1,2,3,4,5,6
respectively . Then I search by the query which is
    +MatchAllDocs -NumericRange(3,3). The expected result then should
be 5 hits since the document 3 is MUST_NOT.

(2) testNumericRangeMultiSearcher(Query query)
    where i create 2 RamDirectory(), each of which has 3 documents,
1,2,3; and 4,5,6. Then I search by the same query as above using
multiSearcher. The expected result should also be 5 hits.

However, from (1), we get 5 hits = expected results, while in (2) we
get 6 hits != expected results.

We also experimented this with our zoie/bobo open source tools and get
the same results because our multi-bobo-browser is built on
multi-searcher in lucene.


I already emailed the lucene community group. Hopefully we can get some feedback soon.
If you have any further concern, pls let me know!

Thank you very much!


Code:  (based on lucene 3.0.x)



import java.io.IOException;
import java.io.PrintStream;
import java.text.DecimalFormat;

import org.apache.lucene.analysis.WhitespaceAnalyzer;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.NumericField;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;

import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.FieldCache;
import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.MultiSearcher;

import org.apache.lucene.search.NumericRangeQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Searchable;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;

import org.apache.lucene.search.BooleanClause.Occur;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.LockObtainFailedException;

import org.apache.lucene.store.RAMDirectory;

import com.convertlucene.ConvertFrom2To3;

public class TestNumericRange
{
 public final static void main(String[] args)
 {
   try

   {
     BooleanQuery query = new  BooleanQuery();
     query.add(NumericRangeQuery.newIntRange("numId", 3, 3, true,
true), Occur.MUST_NOT);
     query.add(new MatchAllDocsQuery(), Occur.MUST);

     testNumericRangeSingleSearcher(query);
     testNumericRangeMultiSearcher(query);

   }
   catch(Exception e)
   {
     e.printStackTrace();
   }
 }



 public static void testNumericRangeSingleSearcher(Query query)
throws CorruptIndexException, LockObtainFailedException, IOException
 {
    String[] ids = {"1", "2", "3", "4", "5", "6"};


   Directory directory = new RAMDirectory();

   IndexWriter writer = new IndexWriter(directory, new
WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);

   for (int i = 0; i < ids.length; i++)
   {
     Document doc = new Document();
     doc.add(new Field("id", ids[i],
                       Field.Store.YES,
                       Field.Index.NOT_ANALYZED));
     doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids[i])));
     writer.addDocument(doc);
   }
   writer.close();


   IndexSearcher searcher = new IndexSearcher(directory);

   TopDocs docs = searcher.search(query, 10);
   System.out.println("SingleSearcher: testNumericRange: hitNum: " +
docs.totalHits);
   for(ScoreDoc doc : docs.scoreDocs)
   {
     System.out.println(searcher.explain(query, doc.doc));
   }
   searcher.close();

   directory.close();
 }

 public static void testNumericRangeMultiSearcher(Query query) throws
CorruptIndexException, LockObtainFailedException, IOException
 {
    String[] ids1 = {"1", "2", "3"};
   Directory directory1 = new RAMDirectory();
   IndexWriter writer1 = new IndexWriter(directory1, new
WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
   for (int i = 0; i < ids1.length; i++)
   {
     Document doc = new Document();
     doc.add(new Field("id", ids1[i],

                       Field.Store.YES,
                       Field.Index.NOT_ANALYZED));
     doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids1[i])));
     writer1.addDocument(doc);
   }
   writer1.close();

   String[] ids2 = {"4", "5", "6"};
   Directory directory2 = new RAMDirectory();
   IndexWriter writer2 = new IndexWriter(directory2, new
WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
   for (int i = 0; i < ids2.length; i++)
   {
     Document doc = new Document();
     doc.add(new Field("id", ids2[i],

                       Field.Store.YES,
                       Field.Index.NOT_ANALYZED));
     doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids2[i])));
     writer2.addDocument(doc);
   }
   writer2.close();

   IndexSearcher[] searchers = new IndexSearcher[2];
   searchers[0] = new IndexSearcher(directory1);
   searchers[1] = new IndexSearcher(directory2);
   MultiSearcher multiSearcher = new MultiSearcher(searchers);
   TopDocs docs = multiSearcher.search(query, 10);
   System.out.println("MultiSearcher: testNumericRange: hitNum: " +
docs.totalHits);
   for(ScoreDoc doc : docs.scoreDocs)
   {
     System.out.println(multiSearcher.explain(query, doc.doc));
   }
   multiSearcher.close();

   directory1.close();
   directory2.close();
 }


}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033478#comment-13033478 ]

Uwe Schindler commented on LUCENE-3096:
---------------------------------------

This is a well-known bug (LUCENE-2756), which is unfixable (query rewrite across different searchers is wrong) without totally changing the way how queries are rewritten.

To fix the bug, you should use a MultiReader on your IndexReaders and use a simple IndexSearcher on top of that MultiReader:

{code}
IndexReader[] readers;
readers[0] = IndexReader.open(directory);
readers[1] = IndexReader.open(otherdirectory);
...
IndexSearcher searcher = new IndexSearcher(new MultiReader(readers));
{code}

MultiSearcher and ParallelMultiSearcher were deprecated in 3.1 because of this and disappear in coming Lucene 4.0. ParallelMultiSearcher functionality is now available through IndexSearcher in 3.1 (it parallelizes across index segments, LUCENE-2837).

I will close this as won't fix if nobody objects.

> MultiSearcher does not work correctly with Not on NumericRange
> --------------------------------------------------------------
>
>                 Key: LUCENE-3096
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3096
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: John Wang
>
> Hi, Keith
> My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular,
> If we search with Not on NumericRange and we use MultiSearcher, we
> will wrong search results (However, if we use IndexSearcher, the
> result is correct).  Basically the NotOfNumericRange does not have
> impact on multisearcher. We suspect it is because the createWeight()
> function in MultiSearcher and hope you can help us to fix this bug of
> lucene. I attached the code to reproduce this case. Please check it
> out.
> In the attached code, I have two separate functions :
> (1) testNumericRangeSingleSearcher(Query query)
>     where I create 6 documents, with a field called "id"= 1,2,3,4,5,6
> respectively . Then I search by the query which is
>     +MatchAllDocs -NumericRange(3,3). The expected result then should
> be 5 hits since the document 3 is MUST_NOT.
> (2) testNumericRangeMultiSearcher(Query query)
>     where i create 2 RamDirectory(), each of which has 3 documents,
> 1,2,3; and 4,5,6. Then I search by the same query as above using
> multiSearcher. The expected result should also be 5 hits.
> However, from (1), we get 5 hits = expected results, while in (2) we
> get 6 hits != expected results.
> We also experimented this with our zoie/bobo open source tools and get
> the same results because our multi-bobo-browser is built on
> multi-searcher in lucene.
> I already emailed the lucene community group. Hopefully we can get some feedback soon.
> If you have any further concern, pls let me know!
> Thank you very much!
> Code:  (based on lucene 3.0.x)
> import java.io.IOException;
> import java.io.PrintStream;
> import java.text.DecimalFormat;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.NumericField;
> import org.apache.lucene.index.CorruptIndexException;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.BooleanQuery;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.MatchAllDocsQuery;
> import org.apache.lucene.search.MultiSearcher;
> import org.apache.lucene.search.NumericRangeQuery;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.Searchable;
> import org.apache.lucene.search.Sort;
> import org.apache.lucene.search.SortField;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.search.TopDocs;
> import org.apache.lucene.search.BooleanClause.Occur;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.LockObtainFailedException;
> import org.apache.lucene.store.RAMDirectory;
> import com.convertlucene.ConvertFrom2To3;
> public class TestNumericRange
> {
>  public final static void main(String[] args)
>  {
>    try
>    {
>      BooleanQuery query = new  BooleanQuery();
>      query.add(NumericRangeQuery.newIntRange("numId", 3, 3, true,
> true), Occur.MUST_NOT);
>      query.add(new MatchAllDocsQuery(), Occur.MUST);
>      testNumericRangeSingleSearcher(query);
>      testNumericRangeMultiSearcher(query);
>    }
>    catch(Exception e)
>    {
>      e.printStackTrace();
>    }
>  }
>  public static void testNumericRangeSingleSearcher(Query query)
> throws CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids = {"1", "2", "3", "4", "5", "6"};
>    Directory directory = new RAMDirectory();
>    IndexWriter writer = new IndexWriter(directory, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids[i])));
>      writer.addDocument(doc);
>    }
>    writer.close();
>    IndexSearcher searcher = new IndexSearcher(directory);
>    TopDocs docs = searcher.search(query, 10);
>    System.out.println("SingleSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(searcher.explain(query, doc.doc));
>    }
>    searcher.close();
>    directory.close();
>  }
>  public static void testNumericRangeMultiSearcher(Query query) throws
> CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids1 = {"1", "2", "3"};
>    Directory directory1 = new RAMDirectory();
>    IndexWriter writer1 = new IndexWriter(directory1, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids1.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids1[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids1[i])));
>      writer1.addDocument(doc);
>    }
>    writer1.close();
>    String[] ids2 = {"4", "5", "6"};
>    Directory directory2 = new RAMDirectory();
>    IndexWriter writer2 = new IndexWriter(directory2, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids2.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids2[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids2[i])));
>      writer2.addDocument(doc);
>    }
>    writer2.close();
>    IndexSearcher[] searchers = new IndexSearcher[2];
>    searchers[0] = new IndexSearcher(directory1);
>    searchers[1] = new IndexSearcher(directory2);
>    MultiSearcher multiSearcher = new MultiSearcher(searchers);
>    TopDocs docs = multiSearcher.search(query, 10);
>    System.out.println("MultiSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(multiSearcher.explain(query, doc.doc));
>    }
>    multiSearcher.close();
>    directory1.close();
>    directory2.close();
>  }
> }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033479#comment-13033479 ]

Uwe Schindler commented on LUCENE-3096:
---------------------------------------

This was also already reported and answered on the java-user@lao list: [http://www.gossamer-threads.com/lists/lucene/java-user/123996]

> MultiSearcher does not work correctly with Not on NumericRange
> --------------------------------------------------------------
>
>                 Key: LUCENE-3096
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3096
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: John Wang
>
> Hi, Keith
> My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular,
> If we search with Not on NumericRange and we use MultiSearcher, we
> will wrong search results (However, if we use IndexSearcher, the
> result is correct).  Basically the NotOfNumericRange does not have
> impact on multisearcher. We suspect it is because the createWeight()
> function in MultiSearcher and hope you can help us to fix this bug of
> lucene. I attached the code to reproduce this case. Please check it
> out.
> In the attached code, I have two separate functions :
> (1) testNumericRangeSingleSearcher(Query query)
>     where I create 6 documents, with a field called "id"= 1,2,3,4,5,6
> respectively . Then I search by the query which is
>     +MatchAllDocs -NumericRange(3,3). The expected result then should
> be 5 hits since the document 3 is MUST_NOT.
> (2) testNumericRangeMultiSearcher(Query query)
>     where i create 2 RamDirectory(), each of which has 3 documents,
> 1,2,3; and 4,5,6. Then I search by the same query as above using
> multiSearcher. The expected result should also be 5 hits.
> However, from (1), we get 5 hits = expected results, while in (2) we
> get 6 hits != expected results.
> We also experimented this with our zoie/bobo open source tools and get
> the same results because our multi-bobo-browser is built on
> multi-searcher in lucene.
> I already emailed the lucene community group. Hopefully we can get some feedback soon.
> If you have any further concern, pls let me know!
> Thank you very much!
> Code:  (based on lucene 3.0.x)
> import java.io.IOException;
> import java.io.PrintStream;
> import java.text.DecimalFormat;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.NumericField;
> import org.apache.lucene.index.CorruptIndexException;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.BooleanQuery;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.MatchAllDocsQuery;
> import org.apache.lucene.search.MultiSearcher;
> import org.apache.lucene.search.NumericRangeQuery;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.Searchable;
> import org.apache.lucene.search.Sort;
> import org.apache.lucene.search.SortField;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.search.TopDocs;
> import org.apache.lucene.search.BooleanClause.Occur;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.LockObtainFailedException;
> import org.apache.lucene.store.RAMDirectory;
> import com.convertlucene.ConvertFrom2To3;
> public class TestNumericRange
> {
>  public final static void main(String[] args)
>  {
>    try
>    {
>      BooleanQuery query = new  BooleanQuery();
>      query.add(NumericRangeQuery.newIntRange("numId", 3, 3, true,
> true), Occur.MUST_NOT);
>      query.add(new MatchAllDocsQuery(), Occur.MUST);
>      testNumericRangeSingleSearcher(query);
>      testNumericRangeMultiSearcher(query);
>    }
>    catch(Exception e)
>    {
>      e.printStackTrace();
>    }
>  }
>  public static void testNumericRangeSingleSearcher(Query query)
> throws CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids = {"1", "2", "3", "4", "5", "6"};
>    Directory directory = new RAMDirectory();
>    IndexWriter writer = new IndexWriter(directory, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids[i])));
>      writer.addDocument(doc);
>    }
>    writer.close();
>    IndexSearcher searcher = new IndexSearcher(directory);
>    TopDocs docs = searcher.search(query, 10);
>    System.out.println("SingleSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(searcher.explain(query, doc.doc));
>    }
>    searcher.close();
>    directory.close();
>  }
>  public static void testNumericRangeMultiSearcher(Query query) throws
> CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids1 = {"1", "2", "3"};
>    Directory directory1 = new RAMDirectory();
>    IndexWriter writer1 = new IndexWriter(directory1, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids1.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids1[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids1[i])));
>      writer1.addDocument(doc);
>    }
>    writer1.close();
>    String[] ids2 = {"4", "5", "6"};
>    Directory directory2 = new RAMDirectory();
>    IndexWriter writer2 = new IndexWriter(directory2, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids2.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids2[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids2[i])));
>      writer2.addDocument(doc);
>    }
>    writer2.close();
>    IndexSearcher[] searchers = new IndexSearcher[2];
>    searchers[0] = new IndexSearcher(directory1);
>    searchers[1] = new IndexSearcher(directory2);
>    MultiSearcher multiSearcher = new MultiSearcher(searchers);
>    TopDocs docs = multiSearcher.search(query, 10);
>    System.out.println("MultiSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(multiSearcher.explain(query, doc.doc));
>    }
>    multiSearcher.close();
>    directory1.close();
>    directory2.close();
>  }
> }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-3096.
-----------------------------------

       Resolution: Duplicate
    Fix Version/s: 3.1

This is a duplicate of LUCENE-2756 and fixed by deprecating (3.1) and removing (4.0) broken (Parallel)MultiSearcher in favour of IndexSearcher on top of MultiReader.

> MultiSearcher does not work correctly with Not on NumericRange
> --------------------------------------------------------------
>
>                 Key: LUCENE-3096
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3096
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: John Wang
>             Fix For: 3.1
>
>
> Hi, Keith
> My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular,
> If we search with Not on NumericRange and we use MultiSearcher, we
> will wrong search results (However, if we use IndexSearcher, the
> result is correct).  Basically the NotOfNumericRange does not have
> impact on multisearcher. We suspect it is because the createWeight()
> function in MultiSearcher and hope you can help us to fix this bug of
> lucene. I attached the code to reproduce this case. Please check it
> out.
> In the attached code, I have two separate functions :
> (1) testNumericRangeSingleSearcher(Query query)
>     where I create 6 documents, with a field called "id"= 1,2,3,4,5,6
> respectively . Then I search by the query which is
>     +MatchAllDocs -NumericRange(3,3). The expected result then should
> be 5 hits since the document 3 is MUST_NOT.
> (2) testNumericRangeMultiSearcher(Query query)
>     where i create 2 RamDirectory(), each of which has 3 documents,
> 1,2,3; and 4,5,6. Then I search by the same query as above using
> multiSearcher. The expected result should also be 5 hits.
> However, from (1), we get 5 hits = expected results, while in (2) we
> get 6 hits != expected results.
> We also experimented this with our zoie/bobo open source tools and get
> the same results because our multi-bobo-browser is built on
> multi-searcher in lucene.
> I already emailed the lucene community group. Hopefully we can get some feedback soon.
> If you have any further concern, pls let me know!
> Thank you very much!
> Code:  (based on lucene 3.0.x)
> import java.io.IOException;
> import java.io.PrintStream;
> import java.text.DecimalFormat;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.NumericField;
> import org.apache.lucene.index.CorruptIndexException;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.BooleanQuery;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.MatchAllDocsQuery;
> import org.apache.lucene.search.MultiSearcher;
> import org.apache.lucene.search.NumericRangeQuery;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.Searchable;
> import org.apache.lucene.search.Sort;
> import org.apache.lucene.search.SortField;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.search.TopDocs;
> import org.apache.lucene.search.BooleanClause.Occur;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.LockObtainFailedException;
> import org.apache.lucene.store.RAMDirectory;
> import com.convertlucene.ConvertFrom2To3;
> public class TestNumericRange
> {
>  public final static void main(String[] args)
>  {
>    try
>    {
>      BooleanQuery query = new  BooleanQuery();
>      query.add(NumericRangeQuery.newIntRange("numId", 3, 3, true,
> true), Occur.MUST_NOT);
>      query.add(new MatchAllDocsQuery(), Occur.MUST);
>      testNumericRangeSingleSearcher(query);
>      testNumericRangeMultiSearcher(query);
>    }
>    catch(Exception e)
>    {
>      e.printStackTrace();
>    }
>  }
>  public static void testNumericRangeSingleSearcher(Query query)
> throws CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids = {"1", "2", "3", "4", "5", "6"};
>    Directory directory = new RAMDirectory();
>    IndexWriter writer = new IndexWriter(directory, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids[i])));
>      writer.addDocument(doc);
>    }
>    writer.close();
>    IndexSearcher searcher = new IndexSearcher(directory);
>    TopDocs docs = searcher.search(query, 10);
>    System.out.println("SingleSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(searcher.explain(query, doc.doc));
>    }
>    searcher.close();
>    directory.close();
>  }
>  public static void testNumericRangeMultiSearcher(Query query) throws
> CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids1 = {"1", "2", "3"};
>    Directory directory1 = new RAMDirectory();
>    IndexWriter writer1 = new IndexWriter(directory1, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids1.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids1[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids1[i])));
>      writer1.addDocument(doc);
>    }
>    writer1.close();
>    String[] ids2 = {"4", "5", "6"};
>    Directory directory2 = new RAMDirectory();
>    IndexWriter writer2 = new IndexWriter(directory2, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids2.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids2[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids2[i])));
>      writer2.addDocument(doc);
>    }
>    writer2.close();
>    IndexSearcher[] searchers = new IndexSearcher[2];
>    searchers[0] = new IndexSearcher(directory1);
>    searchers[1] = new IndexSearcher(directory2);
>    MultiSearcher multiSearcher = new MultiSearcher(searchers);
>    TopDocs docs = multiSearcher.search(query, 10);
>    System.out.println("MultiSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(multiSearcher.explain(query, doc.doc));
>    }
>    multiSearcher.close();
>    directory1.close();
>    directory2.close();
>  }
> }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033504#comment-13033504 ]

Uwe Schindler commented on LUCENE-3096:
---------------------------------------

An alternative way to fix this in 3.0 (without giving up to use MultiSearcher) is to set the rewrite mode of MultiTermQueries (like NumericRangeQuery) to CONSTANT_SCORE_REWRITE. But this only fixes the bug for those queries (as no BooleanQuery is used during rewrite).

Alltogether, negative queries in MultiSearcher are broken and it depends on index contents if the bug actually affects search results.

> MultiSearcher does not work correctly with Not on NumericRange
> --------------------------------------------------------------
>
>                 Key: LUCENE-3096
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3096
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: John Wang
>             Fix For: 3.1
>
>
> Hi, Keith
> My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular,
> If we search with Not on NumericRange and we use MultiSearcher, we
> will wrong search results (However, if we use IndexSearcher, the
> result is correct).  Basically the NotOfNumericRange does not have
> impact on multisearcher. We suspect it is because the createWeight()
> function in MultiSearcher and hope you can help us to fix this bug of
> lucene. I attached the code to reproduce this case. Please check it
> out.
> In the attached code, I have two separate functions :
> (1) testNumericRangeSingleSearcher(Query query)
>     where I create 6 documents, with a field called "id"= 1,2,3,4,5,6
> respectively . Then I search by the query which is
>     +MatchAllDocs -NumericRange(3,3). The expected result then should
> be 5 hits since the document 3 is MUST_NOT.
> (2) testNumericRangeMultiSearcher(Query query)
>     where i create 2 RamDirectory(), each of which has 3 documents,
> 1,2,3; and 4,5,6. Then I search by the same query as above using
> multiSearcher. The expected result should also be 5 hits.
> However, from (1), we get 5 hits = expected results, while in (2) we
> get 6 hits != expected results.
> We also experimented this with our zoie/bobo open source tools and get
> the same results because our multi-bobo-browser is built on
> multi-searcher in lucene.
> I already emailed the lucene community group. Hopefully we can get some feedback soon.
> If you have any further concern, pls let me know!
> Thank you very much!
> Code:  (based on lucene 3.0.x)
> import java.io.IOException;
> import java.io.PrintStream;
> import java.text.DecimalFormat;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.NumericField;
> import org.apache.lucene.index.CorruptIndexException;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.BooleanQuery;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.MatchAllDocsQuery;
> import org.apache.lucene.search.MultiSearcher;
> import org.apache.lucene.search.NumericRangeQuery;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.Searchable;
> import org.apache.lucene.search.Sort;
> import org.apache.lucene.search.SortField;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.search.TopDocs;
> import org.apache.lucene.search.BooleanClause.Occur;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.LockObtainFailedException;
> import org.apache.lucene.store.RAMDirectory;
> import com.convertlucene.ConvertFrom2To3;
> public class TestNumericRange
> {
>  public final static void main(String[] args)
>  {
>    try
>    {
>      BooleanQuery query = new  BooleanQuery();
>      query.add(NumericRangeQuery.newIntRange("numId", 3, 3, true,
> true), Occur.MUST_NOT);
>      query.add(new MatchAllDocsQuery(), Occur.MUST);
>      testNumericRangeSingleSearcher(query);
>      testNumericRangeMultiSearcher(query);
>    }
>    catch(Exception e)
>    {
>      e.printStackTrace();
>    }
>  }
>  public static void testNumericRangeSingleSearcher(Query query)
> throws CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids = {"1", "2", "3", "4", "5", "6"};
>    Directory directory = new RAMDirectory();
>    IndexWriter writer = new IndexWriter(directory, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids[i])));
>      writer.addDocument(doc);
>    }
>    writer.close();
>    IndexSearcher searcher = new IndexSearcher(directory);
>    TopDocs docs = searcher.search(query, 10);
>    System.out.println("SingleSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(searcher.explain(query, doc.doc));
>    }
>    searcher.close();
>    directory.close();
>  }
>  public static void testNumericRangeMultiSearcher(Query query) throws
> CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids1 = {"1", "2", "3"};
>    Directory directory1 = new RAMDirectory();
>    IndexWriter writer1 = new IndexWriter(directory1, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids1.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids1[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids1[i])));
>      writer1.addDocument(doc);
>    }
>    writer1.close();
>    String[] ids2 = {"4", "5", "6"};
>    Directory directory2 = new RAMDirectory();
>    IndexWriter writer2 = new IndexWriter(directory2, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids2.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids2[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids2[i])));
>      writer2.addDocument(doc);
>    }
>    writer2.close();
>    IndexSearcher[] searchers = new IndexSearcher[2];
>    searchers[0] = new IndexSearcher(directory1);
>    searchers[1] = new IndexSearcher(directory2);
>    MultiSearcher multiSearcher = new MultiSearcher(searchers);
>    TopDocs docs = multiSearcher.search(query, 10);
>    System.out.println("MultiSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(multiSearcher.explain(query, doc.doc));
>    }
>    multiSearcher.close();
>    directory1.close();
>    directory2.close();
>  }
> }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Issue Comment Edited] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033504#comment-13033504 ]

Uwe Schindler edited comment on LUCENE-3096 at 5/14/11 11:35 AM:
-----------------------------------------------------------------

An alternative way to bypass this in 3.0 (without giving up to use MultiSearcher) is to set the rewrite mode of MultiTermQueries (like NumericRangeQuery) to CONSTANT_SCORE_REWRITE. But this only fixes the bug for those queries (as no BooleanQuery is used during rewrite).

Alltogether, negative queries in MultiSearcher are broken and it depends on index contents if the bug actually affects search results.

      was (Author: thetaphi):
    An alternative way to fix this in 3.0 (without giving up to use MultiSearcher) is to set the rewrite mode of MultiTermQueries (like NumericRangeQuery) to CONSTANT_SCORE_REWRITE. But this only fixes the bug for those queries (as no BooleanQuery is used during rewrite).

Alltogether, negative queries in MultiSearcher are broken and it depends on index contents if the bug actually affects search results.
 

> MultiSearcher does not work correctly with Not on NumericRange
> --------------------------------------------------------------
>
>                 Key: LUCENE-3096
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3096
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: John Wang
>             Fix For: 3.1
>
>
> Hi, Keith
> My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular,
> If we search with Not on NumericRange and we use MultiSearcher, we
> will wrong search results (However, if we use IndexSearcher, the
> result is correct).  Basically the NotOfNumericRange does not have
> impact on multisearcher. We suspect it is because the createWeight()
> function in MultiSearcher and hope you can help us to fix this bug of
> lucene. I attached the code to reproduce this case. Please check it
> out.
> In the attached code, I have two separate functions :
> (1) testNumericRangeSingleSearcher(Query query)
>     where I create 6 documents, with a field called "id"= 1,2,3,4,5,6
> respectively . Then I search by the query which is
>     +MatchAllDocs -NumericRange(3,3). The expected result then should
> be 5 hits since the document 3 is MUST_NOT.
> (2) testNumericRangeMultiSearcher(Query query)
>     where i create 2 RamDirectory(), each of which has 3 documents,
> 1,2,3; and 4,5,6. Then I search by the same query as above using
> multiSearcher. The expected result should also be 5 hits.
> However, from (1), we get 5 hits = expected results, while in (2) we
> get 6 hits != expected results.
> We also experimented this with our zoie/bobo open source tools and get
> the same results because our multi-bobo-browser is built on
> multi-searcher in lucene.
> I already emailed the lucene community group. Hopefully we can get some feedback soon.
> If you have any further concern, pls let me know!
> Thank you very much!
> Code:  (based on lucene 3.0.x)
> import java.io.IOException;
> import java.io.PrintStream;
> import java.text.DecimalFormat;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.NumericField;
> import org.apache.lucene.index.CorruptIndexException;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.BooleanQuery;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.MatchAllDocsQuery;
> import org.apache.lucene.search.MultiSearcher;
> import org.apache.lucene.search.NumericRangeQuery;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.Searchable;
> import org.apache.lucene.search.Sort;
> import org.apache.lucene.search.SortField;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.search.TopDocs;
> import org.apache.lucene.search.BooleanClause.Occur;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.LockObtainFailedException;
> import org.apache.lucene.store.RAMDirectory;
> import com.convertlucene.ConvertFrom2To3;
> public class TestNumericRange
> {
>  public final static void main(String[] args)
>  {
>    try
>    {
>      BooleanQuery query = new  BooleanQuery();
>      query.add(NumericRangeQuery.newIntRange("numId", 3, 3, true,
> true), Occur.MUST_NOT);
>      query.add(new MatchAllDocsQuery(), Occur.MUST);
>      testNumericRangeSingleSearcher(query);
>      testNumericRangeMultiSearcher(query);
>    }
>    catch(Exception e)
>    {
>      e.printStackTrace();
>    }
>  }
>  public static void testNumericRangeSingleSearcher(Query query)
> throws CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids = {"1", "2", "3", "4", "5", "6"};
>    Directory directory = new RAMDirectory();
>    IndexWriter writer = new IndexWriter(directory, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids[i])));
>      writer.addDocument(doc);
>    }
>    writer.close();
>    IndexSearcher searcher = new IndexSearcher(directory);
>    TopDocs docs = searcher.search(query, 10);
>    System.out.println("SingleSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(searcher.explain(query, doc.doc));
>    }
>    searcher.close();
>    directory.close();
>  }
>  public static void testNumericRangeMultiSearcher(Query query) throws
> CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids1 = {"1", "2", "3"};
>    Directory directory1 = new RAMDirectory();
>    IndexWriter writer1 = new IndexWriter(directory1, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids1.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids1[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids1[i])));
>      writer1.addDocument(doc);
>    }
>    writer1.close();
>    String[] ids2 = {"4", "5", "6"};
>    Directory directory2 = new RAMDirectory();
>    IndexWriter writer2 = new IndexWriter(directory2, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids2.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids2[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids2[i])));
>      writer2.addDocument(doc);
>    }
>    writer2.close();
>    IndexSearcher[] searchers = new IndexSearcher[2];
>    searchers[0] = new IndexSearcher(directory1);
>    searchers[1] = new IndexSearcher(directory2);
>    MultiSearcher multiSearcher = new MultiSearcher(searchers);
>    TopDocs docs = multiSearcher.search(query, 10);
>    System.out.println("MultiSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(multiSearcher.explain(query, doc.doc));
>    }
>    multiSearcher.close();
>    directory1.close();
>    directory2.close();
>  }
> }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033878#comment-13033878 ]

Xiaoyang Gu commented on LUCENE-3096:
-------------------------------------

Thank you very much!

Xiaoyang

> MultiSearcher does not work correctly with Not on NumericRange
> --------------------------------------------------------------
>
>                 Key: LUCENE-3096
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3096
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: John Wang
>             Fix For: 3.1
>
>
> Hi, Keith
> My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular,
> If we search with Not on NumericRange and we use MultiSearcher, we
> will wrong search results (However, if we use IndexSearcher, the
> result is correct).  Basically the NotOfNumericRange does not have
> impact on multisearcher. We suspect it is because the createWeight()
> function in MultiSearcher and hope you can help us to fix this bug of
> lucene. I attached the code to reproduce this case. Please check it
> out.
> In the attached code, I have two separate functions :
> (1) testNumericRangeSingleSearcher(Query query)
>     where I create 6 documents, with a field called "id"= 1,2,3,4,5,6
> respectively . Then I search by the query which is
>     +MatchAllDocs -NumericRange(3,3). The expected result then should
> be 5 hits since the document 3 is MUST_NOT.
> (2) testNumericRangeMultiSearcher(Query query)
>     where i create 2 RamDirectory(), each of which has 3 documents,
> 1,2,3; and 4,5,6. Then I search by the same query as above using
> multiSearcher. The expected result should also be 5 hits.
> However, from (1), we get 5 hits = expected results, while in (2) we
> get 6 hits != expected results.
> We also experimented this with our zoie/bobo open source tools and get
> the same results because our multi-bobo-browser is built on
> multi-searcher in lucene.
> I already emailed the lucene community group. Hopefully we can get some feedback soon.
> If you have any further concern, pls let me know!
> Thank you very much!
> Code:  (based on lucene 3.0.x)
> import java.io.IOException;
> import java.io.PrintStream;
> import java.text.DecimalFormat;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.NumericField;
> import org.apache.lucene.index.CorruptIndexException;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.BooleanQuery;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.MatchAllDocsQuery;
> import org.apache.lucene.search.MultiSearcher;
> import org.apache.lucene.search.NumericRangeQuery;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.Searchable;
> import org.apache.lucene.search.Sort;
> import org.apache.lucene.search.SortField;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.search.TopDocs;
> import org.apache.lucene.search.BooleanClause.Occur;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.LockObtainFailedException;
> import org.apache.lucene.store.RAMDirectory;
> import com.convertlucene.ConvertFrom2To3;
> public class TestNumericRange
> {
>  public final static void main(String[] args)
>  {
>    try
>    {
>      BooleanQuery query = new  BooleanQuery();
>      query.add(NumericRangeQuery.newIntRange("numId", 3, 3, true,
> true), Occur.MUST_NOT);
>      query.add(new MatchAllDocsQuery(), Occur.MUST);
>      testNumericRangeSingleSearcher(query);
>      testNumericRangeMultiSearcher(query);
>    }
>    catch(Exception e)
>    {
>      e.printStackTrace();
>    }
>  }
>  public static void testNumericRangeSingleSearcher(Query query)
> throws CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids = {"1", "2", "3", "4", "5", "6"};
>    Directory directory = new RAMDirectory();
>    IndexWriter writer = new IndexWriter(directory, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids[i])));
>      writer.addDocument(doc);
>    }
>    writer.close();
>    IndexSearcher searcher = new IndexSearcher(directory);
>    TopDocs docs = searcher.search(query, 10);
>    System.out.println("SingleSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(searcher.explain(query, doc.doc));
>    }
>    searcher.close();
>    directory.close();
>  }
>  public static void testNumericRangeMultiSearcher(Query query) throws
> CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids1 = {"1", "2", "3"};
>    Directory directory1 = new RAMDirectory();
>    IndexWriter writer1 = new IndexWriter(directory1, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids1.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids1[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids1[i])));
>      writer1.addDocument(doc);
>    }
>    writer1.close();
>    String[] ids2 = {"4", "5", "6"};
>    Directory directory2 = new RAMDirectory();
>    IndexWriter writer2 = new IndexWriter(directory2, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids2.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids2[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids2[i])));
>      writer2.addDocument(doc);
>    }
>    writer2.close();
>    IndexSearcher[] searchers = new IndexSearcher[2];
>    searchers[0] = new IndexSearcher(directory1);
>    searchers[1] = new IndexSearcher(directory2);
>    MultiSearcher multiSearcher = new MultiSearcher(searchers);
>    TopDocs docs = multiSearcher.search(query, 10);
>    System.out.println("MultiSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(multiSearcher.explain(query, doc.doc));
>    }
>    multiSearcher.close();
>    directory1.close();
>    directory2.close();
>  }
> }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034289#comment-13034289 ]

hao yan commented on LUCENE-3096:
---------------------------------

Thanks! Uwe!



> MultiSearcher does not work correctly with Not on NumericRange
> --------------------------------------------------------------
>
>                 Key: LUCENE-3096
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3096
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.0.2
>            Reporter: John Wang
>             Fix For: 3.1
>
>
> Hi, Keith
> My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular,
> If we search with Not on NumericRange and we use MultiSearcher, we
> will wrong search results (However, if we use IndexSearcher, the
> result is correct).  Basically the NotOfNumericRange does not have
> impact on multisearcher. We suspect it is because the createWeight()
> function in MultiSearcher and hope you can help us to fix this bug of
> lucene. I attached the code to reproduce this case. Please check it
> out.
> In the attached code, I have two separate functions :
> (1) testNumericRangeSingleSearcher(Query query)
>     where I create 6 documents, with a field called "id"= 1,2,3,4,5,6
> respectively . Then I search by the query which is
>     +MatchAllDocs -NumericRange(3,3). The expected result then should
> be 5 hits since the document 3 is MUST_NOT.
> (2) testNumericRangeMultiSearcher(Query query)
>     where i create 2 RamDirectory(), each of which has 3 documents,
> 1,2,3; and 4,5,6. Then I search by the same query as above using
> multiSearcher. The expected result should also be 5 hits.
> However, from (1), we get 5 hits = expected results, while in (2) we
> get 6 hits != expected results.
> We also experimented this with our zoie/bobo open source tools and get
> the same results because our multi-bobo-browser is built on
> multi-searcher in lucene.
> I already emailed the lucene community group. Hopefully we can get some feedback soon.
> If you have any further concern, pls let me know!
> Thank you very much!
> Code:  (based on lucene 3.0.x)
> import java.io.IOException;
> import java.io.PrintStream;
> import java.text.DecimalFormat;
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.NumericField;
> import org.apache.lucene.index.CorruptIndexException;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.BooleanQuery;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.MatchAllDocsQuery;
> import org.apache.lucene.search.MultiSearcher;
> import org.apache.lucene.search.NumericRangeQuery;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.Searchable;
> import org.apache.lucene.search.Sort;
> import org.apache.lucene.search.SortField;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.search.TopDocs;
> import org.apache.lucene.search.BooleanClause.Occur;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.LockObtainFailedException;
> import org.apache.lucene.store.RAMDirectory;
> import com.convertlucene.ConvertFrom2To3;
> public class TestNumericRange
> {
>  public final static void main(String[] args)
>  {
>    try
>    {
>      BooleanQuery query = new  BooleanQuery();
>      query.add(NumericRangeQuery.newIntRange("numId", 3, 3, true,
> true), Occur.MUST_NOT);
>      query.add(new MatchAllDocsQuery(), Occur.MUST);
>      testNumericRangeSingleSearcher(query);
>      testNumericRangeMultiSearcher(query);
>    }
>    catch(Exception e)
>    {
>      e.printStackTrace();
>    }
>  }
>  public static void testNumericRangeSingleSearcher(Query query)
> throws CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids = {"1", "2", "3", "4", "5", "6"};
>    Directory directory = new RAMDirectory();
>    IndexWriter writer = new IndexWriter(directory, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids[i])));
>      writer.addDocument(doc);
>    }
>    writer.close();
>    IndexSearcher searcher = new IndexSearcher(directory);
>    TopDocs docs = searcher.search(query, 10);
>    System.out.println("SingleSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(searcher.explain(query, doc.doc));
>    }
>    searcher.close();
>    directory.close();
>  }
>  public static void testNumericRangeMultiSearcher(Query query) throws
> CorruptIndexException, LockObtainFailedException, IOException
>  {
>     String[] ids1 = {"1", "2", "3"};
>    Directory directory1 = new RAMDirectory();
>    IndexWriter writer1 = new IndexWriter(directory1, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids1.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids1[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids1[i])));
>      writer1.addDocument(doc);
>    }
>    writer1.close();
>    String[] ids2 = {"4", "5", "6"};
>    Directory directory2 = new RAMDirectory();
>    IndexWriter writer2 = new IndexWriter(directory2, new
> WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);
>    for (int i = 0; i < ids2.length; i++)
>    {
>      Document doc = new Document();
>      doc.add(new Field("id", ids2[i],
>                        Field.Store.YES,
>                        Field.Index.NOT_ANALYZED));
>      doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids2[i])));
>      writer2.addDocument(doc);
>    }
>    writer2.close();
>    IndexSearcher[] searchers = new IndexSearcher[2];
>    searchers[0] = new IndexSearcher(directory1);
>    searchers[1] = new IndexSearcher(directory2);
>    MultiSearcher multiSearcher = new MultiSearcher(searchers);
>    TopDocs docs = multiSearcher.search(query, 10);
>    System.out.println("MultiSearcher: testNumericRange: hitNum: " +
> docs.totalHits);
>    for(ScoreDoc doc : docs.scoreDocs)
>    {
>      System.out.println(multiSearcher.explain(query, doc.doc));
>    }
>    multiSearcher.close();
>    directory1.close();
>    directory2.close();
>  }
> }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]