[jira] Created: (LUCENE-1374) Merging of compressed string Fields may hit NPE

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-1374) Merging of compressed string Fields may hit NPE

Nick Burch (Jira)
Merging of compressed string Fields may hit NPE
-----------------------------------------------

                 Key: LUCENE-1374
                 URL: https://issues.apache.org/jira/browse/LUCENE-1374
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.4
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 2.4


This bug was introduced with LUCENE-1219 (only present on 2.4).

The bug happens when merging compressed string fields, but only if bulk-merging code does not apply because the FieldInfos for the segment being merged are not congruent.  This test shows the bug:

{code}
  public void testMergeCompressedFields() throws IOException {
    File indexDir = new File(System.getProperty("tempDir"), "mergecompressedfields");
    Directory dir = FSDirectory.getDirectory(indexDir);
    try {
      for(int i=0;i<5;i++) {
        // Must make a new writer & doc each time, w/
        // different fields, so bulk merge of stored fields
        // cannot run:
        IndexWriter w = new IndexWriter(dir, new WhitespaceAnalyzer(), i==0, IndexWriter.MaxFieldLength.UNLIMITED);
        w.setMergeFactor(5);
        w.setMergeScheduler(new SerialMergeScheduler());
        Document doc = new Document();
        doc.add(new Field("test1", "this is some data that will be compressed this this this", Field.Store.COMPRESS, Field.Index.NO));
        doc.add(new Field("test2", new byte[20], Field.Store.COMPRESS));
        doc.add(new Field("field" + i, "random field", Field.Store.NO, Field.Index.TOKENIZED));
        w.addDocument(doc);
        w.close();
      }

      byte[] cmp = new byte[20];

      IndexReader r = IndexReader.open(dir);
      for(int i=0;i<5;i++) {
        Document doc = r.document(i);
        assertEquals("this is some data that will be compressed this this this", doc.getField("test1").stringValue());
        byte[] b = doc.getField("test2").binaryValue();
        assertTrue(Arrays.equals(b, cmp));
      }
    } finally {
      dir.close();
      _TestUtil.rmDir(indexDir);
    }
  }
{code}

It's because in FieldsReader, when we load a field "for merge" we create a FieldForMerge instance which subsequently does not return the right values for getBinary{Value,Length,Offset}.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1374) Merging of compressed string Fields may hit NPE

Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1374:
---------------------------------------

    Attachment: LUCENE-1374.patch

Attached patch that fixes AbstractField's getBinaryValue() and getBinaryLength() methods to fallback to "fieldsData instanceof byte[]" when appropriate.  I plan to commit shortly.

> Merging of compressed string Fields may hit NPE
> -----------------------------------------------
>
>                 Key: LUCENE-1374
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1374
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.4
>
>         Attachments: LUCENE-1374.patch
>
>
> This bug was introduced with LUCENE-1219 (only present on 2.4).
> The bug happens when merging compressed string fields, but only if bulk-merging code does not apply because the FieldInfos for the segment being merged are not congruent.  This test shows the bug:
> {code}
>   public void testMergeCompressedFields() throws IOException {
>     File indexDir = new File(System.getProperty("tempDir"), "mergecompressedfields");
>     Directory dir = FSDirectory.getDirectory(indexDir);
>     try {
>       for(int i=0;i<5;i++) {
>         // Must make a new writer & doc each time, w/
>         // different fields, so bulk merge of stored fields
>         // cannot run:
>         IndexWriter w = new IndexWriter(dir, new WhitespaceAnalyzer(), i==0, IndexWriter.MaxFieldLength.UNLIMITED);
>         w.setMergeFactor(5);
>         w.setMergeScheduler(new SerialMergeScheduler());
>         Document doc = new Document();
>         doc.add(new Field("test1", "this is some data that will be compressed this this this", Field.Store.COMPRESS, Field.Index.NO));
>         doc.add(new Field("test2", new byte[20], Field.Store.COMPRESS));
>         doc.add(new Field("field" + i, "random field", Field.Store.NO, Field.Index.TOKENIZED));
>         w.addDocument(doc);
>         w.close();
>       }
>       byte[] cmp = new byte[20];
>       IndexReader r = IndexReader.open(dir);
>       for(int i=0;i<5;i++) {
>         Document doc = r.document(i);
>         assertEquals("this is some data that will be compressed this this this", doc.getField("test1").stringValue());
>         byte[] b = doc.getField("test2").binaryValue();
>         assertTrue(Arrays.equals(b, cmp));
>       }
>     } finally {
>       dir.close();
>       _TestUtil.rmDir(indexDir);
>     }
>   }
> {code}
> It's because in FieldsReader, when we load a field "for merge" we create a FieldForMerge instance which subsequently does not return the right values for getBinary{Value,Length,Offset}.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-1374) Merging of compressed string Fields may hit NPE

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1374.
----------------------------------------

    Resolution: Fixed

Committed revision 691617.

> Merging of compressed string Fields may hit NPE
> -----------------------------------------------
>
>                 Key: LUCENE-1374
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1374
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.4
>
>         Attachments: LUCENE-1374.patch
>
>
> This bug was introduced with LUCENE-1219 (only present on 2.4).
> The bug happens when merging compressed string fields, but only if bulk-merging code does not apply because the FieldInfos for the segment being merged are not congruent.  This test shows the bug:
> {code}
>   public void testMergeCompressedFields() throws IOException {
>     File indexDir = new File(System.getProperty("tempDir"), "mergecompressedfields");
>     Directory dir = FSDirectory.getDirectory(indexDir);
>     try {
>       for(int i=0;i<5;i++) {
>         // Must make a new writer & doc each time, w/
>         // different fields, so bulk merge of stored fields
>         // cannot run:
>         IndexWriter w = new IndexWriter(dir, new WhitespaceAnalyzer(), i==0, IndexWriter.MaxFieldLength.UNLIMITED);
>         w.setMergeFactor(5);
>         w.setMergeScheduler(new SerialMergeScheduler());
>         Document doc = new Document();
>         doc.add(new Field("test1", "this is some data that will be compressed this this this", Field.Store.COMPRESS, Field.Index.NO));
>         doc.add(new Field("test2", new byte[20], Field.Store.COMPRESS));
>         doc.add(new Field("field" + i, "random field", Field.Store.NO, Field.Index.TOKENIZED));
>         w.addDocument(doc);
>         w.close();
>       }
>       byte[] cmp = new byte[20];
>       IndexReader r = IndexReader.open(dir);
>       for(int i=0;i<5;i++) {
>         Document doc = r.document(i);
>         assertEquals("this is some data that will be compressed this this this", doc.getField("test1").stringValue());
>         byte[] b = doc.getField("test2").binaryValue();
>         assertTrue(Arrays.equals(b, cmp));
>       }
>     } finally {
>       dir.close();
>       _TestUtil.rmDir(indexDir);
>     }
>   }
> {code}
> It's because in FieldsReader, when we load a field "for merge" we create a FieldForMerge instance which subsequently does not return the right values for getBinary{Value,Length,Offset}.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1374) Merging of compressed string Fields may hit NPE

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628055#action_12628055 ]

Chris Harris commented on LUCENE-1374:
--------------------------------------

"ant test" on 691617 for me fails on the following test:

  <testcase classname="org.apache.lucene.index.TestIndexWriter" name="testMergeCompressedFields" time="0.36">
    <error message="could not delete C:\lucene\691647\build\test\mergecompressedfields\_5.cfs" type="java.io.IOException">java.io.IOException: could not delete C:\lucene\691647\build\test\mergecompressedfields\_5.cfs
        at org.apache.lucene.util._TestUtil.rmDir(_TestUtil.java:37)
        at org.apache.lucene.index.TestIndexWriter.testMergeCompressedFields(TestIndexWriter.java:4111)
</error>
  </testcase>

It might be one of those things that shows up only on Windows. In any case, adding a call to IndexReader.close() in testMergeCompressedFields() seems to fix things up:

      IndexReader r = IndexReader.open(dir);
      for(int i=0;i<5;i++) {
        Document doc = r.document(i);
        assertEquals("this is some data that will be compressed this this this", doc.getField("test1").stringValue());
        byte[] b = doc.getField("test2").binaryValue();
        assertTrue(Arrays.equals(b, cmp));
      }
      r.close();  // <------------------------------- New line
    } finally {
      dir.close();
      _TestUtil.rmDir(indexDir);
    }

I guess technically the r.close() probably belongs in a finally block as well.

> Merging of compressed string Fields may hit NPE
> -----------------------------------------------
>
>                 Key: LUCENE-1374
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1374
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.4
>
>         Attachments: LUCENE-1374.patch
>
>
> This bug was introduced with LUCENE-1219 (only present on 2.4).
> The bug happens when merging compressed string fields, but only if bulk-merging code does not apply because the FieldInfos for the segment being merged are not congruent.  This test shows the bug:
> {code}
>   public void testMergeCompressedFields() throws IOException {
>     File indexDir = new File(System.getProperty("tempDir"), "mergecompressedfields");
>     Directory dir = FSDirectory.getDirectory(indexDir);
>     try {
>       for(int i=0;i<5;i++) {
>         // Must make a new writer & doc each time, w/
>         // different fields, so bulk merge of stored fields
>         // cannot run:
>         IndexWriter w = new IndexWriter(dir, new WhitespaceAnalyzer(), i==0, IndexWriter.MaxFieldLength.UNLIMITED);
>         w.setMergeFactor(5);
>         w.setMergeScheduler(new SerialMergeScheduler());
>         Document doc = new Document();
>         doc.add(new Field("test1", "this is some data that will be compressed this this this", Field.Store.COMPRESS, Field.Index.NO));
>         doc.add(new Field("test2", new byte[20], Field.Store.COMPRESS));
>         doc.add(new Field("field" + i, "random field", Field.Store.NO, Field.Index.TOKENIZED));
>         w.addDocument(doc);
>         w.close();
>       }
>       byte[] cmp = new byte[20];
>       IndexReader r = IndexReader.open(dir);
>       for(int i=0;i<5;i++) {
>         Document doc = r.document(i);
>         assertEquals("this is some data that will be compressed this this this", doc.getField("test1").stringValue());
>         byte[] b = doc.getField("test2").binaryValue();
>         assertTrue(Arrays.equals(b, cmp));
>       }
>     } finally {
>       dir.close();
>       _TestUtil.rmDir(indexDir);
>     }
>   }
> {code}
> It's because in FieldsReader, when we load a field "for merge" we create a FieldForMerge instance which subsequently does not return the right values for getBinary{Value,Length,Offset}.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LUCENE-1374) Merging of compressed string Fields may hit NPE

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628055#action_12628055 ]

ryguasu edited comment on LUCENE-1374 at 9/3/08 10:07 AM:
---------------------------------------------------------------

"ant test" on 691617 for me fails on the following test:

  <testcase classname="org.apache.lucene.index.TestIndexWriter" name="testMergeCompressedFields" time="0.36">
    <error message="could not delete C:\lucene\691647\build\test\mergecompressedfields\_5.cfs" type="java.io.IOException">java.io.IOException: could not delete C:\lucene\691647\build\test\mergecompressedfields\_5.cfs
        at org.apache.lucene.util._TestUtil.rmDir(_TestUtil.java:37)
        at org.apache.lucene.index.TestIndexWriter.testMergeCompressedFields(TestIndexWriter.java:4111)
</error>
  </testcase>

It might be one of those things that shows up only on Windows. In any case, adding a call to IndexReader.close() in testMergeCompressedFields() seems to fix things up:

{code}
      IndexReader r = IndexReader.open(dir);
      for(int i=0;i<5;i++) {
        Document doc = r.document(i);
        assertEquals("this is some data that will be compressed this this this", doc.getField("test1").stringValue());
        byte[] b = doc.getField("test2").binaryValue();
        assertTrue(Arrays.equals(b, cmp));
      }
      r.close();  // <------------------------------- New line
    } finally {
      dir.close();
      _TestUtil.rmDir(indexDir);
    }
{code}

I guess technically the r.close() probably belongs in a finally block as well.

      was (Author: ryguasu):
    "ant test" on 691617 for me fails on the following test:

  <testcase classname="org.apache.lucene.index.TestIndexWriter" name="testMergeCompressedFields" time="0.36">
    <error message="could not delete C:\lucene\691647\build\test\mergecompressedfields\_5.cfs" type="java.io.IOException">java.io.IOException: could not delete C:\lucene\691647\build\test\mergecompressedfields\_5.cfs
        at org.apache.lucene.util._TestUtil.rmDir(_TestUtil.java:37)
        at org.apache.lucene.index.TestIndexWriter.testMergeCompressedFields(TestIndexWriter.java:4111)
</error>
  </testcase>

It might be one of those things that shows up only on Windows. In any case, adding a call to IndexReader.close() in testMergeCompressedFields() seems to fix things up:

      IndexReader r = IndexReader.open(dir);
      for(int i=0;i<5;i++) {
        Document doc = r.document(i);
        assertEquals("this is some data that will be compressed this this this", doc.getField("test1").stringValue());
        byte[] b = doc.getField("test2").binaryValue();
        assertTrue(Arrays.equals(b, cmp));
      }
      r.close();  // <------------------------------- New line
    } finally {
      dir.close();
      _TestUtil.rmDir(indexDir);
    }

I guess technically the r.close() probably belongs in a finally block as well.
 

> Merging of compressed string Fields may hit NPE
> -----------------------------------------------
>
>                 Key: LUCENE-1374
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1374
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.4
>
>         Attachments: LUCENE-1374.patch
>
>
> This bug was introduced with LUCENE-1219 (only present on 2.4).
> The bug happens when merging compressed string fields, but only if bulk-merging code does not apply because the FieldInfos for the segment being merged are not congruent.  This test shows the bug:
> {code}
>   public void testMergeCompressedFields() throws IOException {
>     File indexDir = new File(System.getProperty("tempDir"), "mergecompressedfields");
>     Directory dir = FSDirectory.getDirectory(indexDir);
>     try {
>       for(int i=0;i<5;i++) {
>         // Must make a new writer & doc each time, w/
>         // different fields, so bulk merge of stored fields
>         // cannot run:
>         IndexWriter w = new IndexWriter(dir, new WhitespaceAnalyzer(), i==0, IndexWriter.MaxFieldLength.UNLIMITED);
>         w.setMergeFactor(5);
>         w.setMergeScheduler(new SerialMergeScheduler());
>         Document doc = new Document();
>         doc.add(new Field("test1", "this is some data that will be compressed this this this", Field.Store.COMPRESS, Field.Index.NO));
>         doc.add(new Field("test2", new byte[20], Field.Store.COMPRESS));
>         doc.add(new Field("field" + i, "random field", Field.Store.NO, Field.Index.TOKENIZED));
>         w.addDocument(doc);
>         w.close();
>       }
>       byte[] cmp = new byte[20];
>       IndexReader r = IndexReader.open(dir);
>       for(int i=0;i<5;i++) {
>         Document doc = r.document(i);
>         assertEquals("this is some data that will be compressed this this this", doc.getField("test1").stringValue());
>         byte[] b = doc.getField("test2").binaryValue();
>         assertTrue(Arrays.equals(b, cmp));
>       }
>     } finally {
>       dir.close();
>       _TestUtil.rmDir(indexDir);
>     }
>   }
> {code}
> It's because in FieldsReader, when we load a field "for merge" we create a FieldForMerge instance which subsequently does not return the right values for getBinary{Value,Length,Offset}.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1374) Merging of compressed string Fields may hit NPE

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628067#action_12628067 ]

Michael McCandless commented on LUCENE-1374:
--------------------------------------------

Woops, you're right: I too see that failure (to rmDir the directory) only on Windows.  I'll commit a fix.  Thanks Chris!

> Merging of compressed string Fields may hit NPE
> -----------------------------------------------
>
>                 Key: LUCENE-1374
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1374
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.4
>
>         Attachments: LUCENE-1374.patch
>
>
> This bug was introduced with LUCENE-1219 (only present on 2.4).
> The bug happens when merging compressed string fields, but only if bulk-merging code does not apply because the FieldInfos for the segment being merged are not congruent.  This test shows the bug:
> {code}
>   public void testMergeCompressedFields() throws IOException {
>     File indexDir = new File(System.getProperty("tempDir"), "mergecompressedfields");
>     Directory dir = FSDirectory.getDirectory(indexDir);
>     try {
>       for(int i=0;i<5;i++) {
>         // Must make a new writer & doc each time, w/
>         // different fields, so bulk merge of stored fields
>         // cannot run:
>         IndexWriter w = new IndexWriter(dir, new WhitespaceAnalyzer(), i==0, IndexWriter.MaxFieldLength.UNLIMITED);
>         w.setMergeFactor(5);
>         w.setMergeScheduler(new SerialMergeScheduler());
>         Document doc = new Document();
>         doc.add(new Field("test1", "this is some data that will be compressed this this this", Field.Store.COMPRESS, Field.Index.NO));
>         doc.add(new Field("test2", new byte[20], Field.Store.COMPRESS));
>         doc.add(new Field("field" + i, "random field", Field.Store.NO, Field.Index.TOKENIZED));
>         w.addDocument(doc);
>         w.close();
>       }
>       byte[] cmp = new byte[20];
>       IndexReader r = IndexReader.open(dir);
>       for(int i=0;i<5;i++) {
>         Document doc = r.document(i);
>         assertEquals("this is some data that will be compressed this this this", doc.getField("test1").stringValue());
>         byte[] b = doc.getField("test2").binaryValue();
>         assertTrue(Arrays.equals(b, cmp));
>       }
>     } finally {
>       dir.close();
>       _TestUtil.rmDir(indexDir);
>     }
>   }
> {code}
> It's because in FieldsReader, when we load a field "for merge" we create a FieldForMerge instance which subsequently does not return the right values for getBinary{Value,Length,Offset}.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]