[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

dnhatn
GitHub user s1monw opened a pull request:

    https://github.com/apache/lucene-solr/pull/516

    LUCENE-8594: DV update are broken for updates on new field

    A segmemnt written with Lucene70Codec failes if it ties to update
    a DV field that didn't exist in the index before it was upgraded to
    Lucene80Codec. We bake the DV format into the FieldInfo when it's used
    the first time and therefor never go to the codec if we need to update.
    yet on a field that didn't exist before and was added during an indexing
    operation we have to consult the coded and get an exception.
    This change fixes this issue and adds the relevant bwc tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/s1monw/lucene-solr fix_dv_update_on_old_index

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #516
   
----
commit 45bb0938105cae6aaaa62d04f757e1cfe70ecb76
Author: Simon Willnauer <simonw@...>
Date:   2018-12-06T10:31:02Z

    LUCENE-8594: DV update are broken for updates on new field
   
    A segmemnt written with Lucene70Codec failes if it ties to update
    a DV field that didn't exist in the index before it was upgraded to
    Lucene80Codec. We bake the DV format into the FieldInfo when it's used
    the first time and therefor never go to the codec if we need to update.
    yet on a field that didn't exist before and was added during an indexing
    operation we have to consult the coded and get an exception.
    This change fixes this issue and adds the relevant bwc tests.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr issue #516: LUCENE-8594: DV update are broken for updates on new...

dnhatn
Github user s1monw commented on the issue:

    https://github.com/apache/lucene-solr/pull/516
 
    this was found in https://github.com/elastic/elasticsearch/pull/36286


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

dnhatn
In reply to this post by dnhatn
Github user jpountz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239407845
 
    --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java ---
    @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
         writer.close();
         dir.close();
       }
    +
    +  public void testSoftDeletes() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    int maxDoc = writer.maxDoc();
    +    writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1));
    +
    +    if (random().nextBoolean()) {
    +      writer.commit();
    +    }
    +    writer.forceMerge(1);
    +    writer.commit();
    +    assertEquals(maxDoc-1, writer.maxDoc());
    +    writer.close();
    +    dir.close();
    +  }
    +
    +  public void testDocValuesUpdatesWithNewField() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +
    +    // update fields and verify index
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random()));
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    // introduce a new field that we later update
    +    writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO),
    +        new NumericDocValuesField("new_numeric", 1),
    +        new BinaryDocValuesField("new_binary", toBytes(1))));
    +    writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
    +    writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1));
    +
    +    writer.commit();
    +    Callable assertDV = () -> {
    +      boolean found = false;
    +      try (DirectoryReader reader = DirectoryReader.open(dir)) {
    +        for (LeafReaderContext ctx : reader.leaves()) {
    +          LeafReader leafReader = ctx.reader();
    +          TermsEnum id = leafReader.terms("id").iterator();
    +          if (id.seekExact(new BytesRef("1"))) {
    +            PostingsEnum postings = id.postings(null, PostingsEnum.NONE);
    +            NumericDocValues numericDocValues = leafReader.getNumericDocValues("new_numeric");
    +            BinaryDocValues binaryDocValues = leafReader.getBinaryDocValues("new_binary");
    +            int doc;
    +            while ((doc = postings.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
    +              found = true;
    +              assertEquals(doc, binaryDocValues.nextDoc());
    +              assertEquals(doc, numericDocValues.nextDoc());
    +              assertEquals(1, numericDocValues.longValue());
    +              assertEquals(toBytes(1), binaryDocValues.binaryValue());
    +            }
    +          }
    --- End diff --
   
    for completeness, add `assertEquals(DocIdSetIterator.NO_MORE_DOCS, binaryDocValues.nextDoc()); assertEquals(DocIdSetIterator.NO_MORE_DOCS, numericDocValues.nextDoc());`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

dnhatn
In reply to this post by dnhatn
Github user jpountz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239408003
 
    --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java ---
    @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
         writer.close();
         dir.close();
       }
    +
    +  public void testSoftDeletes() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    int maxDoc = writer.maxDoc();
    +    writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1));
    +
    +    if (random().nextBoolean()) {
    +      writer.commit();
    +    }
    +    writer.forceMerge(1);
    +    writer.commit();
    +    assertEquals(maxDoc-1, writer.maxDoc());
    +    writer.close();
    +    dir.close();
    +  }
    +
    +  public void testDocValuesUpdatesWithNewField() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +
    +    // update fields and verify index
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random()));
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    // introduce a new field that we later update
    +    writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO),
    +        new NumericDocValuesField("new_numeric", 1),
    +        new BinaryDocValuesField("new_binary", toBytes(1))));
    +    writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
    +    writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1));
    +
    +    writer.commit();
    +    Callable assertDV = () -> {
    --- End diff --
   
    s/Callable/Runnable/ since you don't care about the return value/


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

dnhatn
In reply to this post by dnhatn
Github user s1monw commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239408275
 
    --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java ---
    @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
         writer.close();
         dir.close();
       }
    +
    +  public void testSoftDeletes() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    int maxDoc = writer.maxDoc();
    +    writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1));
    +
    +    if (random().nextBoolean()) {
    +      writer.commit();
    +    }
    +    writer.forceMerge(1);
    +    writer.commit();
    +    assertEquals(maxDoc-1, writer.maxDoc());
    +    writer.close();
    +    dir.close();
    +  }
    +
    +  public void testDocValuesUpdatesWithNewField() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +
    +    // update fields and verify index
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random()));
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    // introduce a new field that we later update
    +    writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO),
    +        new NumericDocValuesField("new_numeric", 1),
    +        new BinaryDocValuesField("new_binary", toBytes(1))));
    +    writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
    +    writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1));
    +
    +    writer.commit();
    +    Callable assertDV = () -> {
    --- End diff --
   
    it's throws an exception hence the callable


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

dnhatn
In reply to this post by dnhatn
Github user s1monw commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239408501
 
    --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java ---
    @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
         writer.close();
         dir.close();
       }
    +
    +  public void testSoftDeletes() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    int maxDoc = writer.maxDoc();
    +    writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1));
    +
    +    if (random().nextBoolean()) {
    +      writer.commit();
    +    }
    +    writer.forceMerge(1);
    +    writer.commit();
    +    assertEquals(maxDoc-1, writer.maxDoc());
    +    writer.close();
    +    dir.close();
    +  }
    +
    +  public void testDocValuesUpdatesWithNewField() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +
    +    // update fields and verify index
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random()));
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    // introduce a new field that we later update
    +    writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO),
    +        new NumericDocValuesField("new_numeric", 1),
    +        new BinaryDocValuesField("new_binary", toBytes(1))));
    +    writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
    +    writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1));
    +
    +    writer.commit();
    +    Callable assertDV = () -> {
    +      boolean found = false;
    +      try (DirectoryReader reader = DirectoryReader.open(dir)) {
    +        for (LeafReaderContext ctx : reader.leaves()) {
    +          LeafReader leafReader = ctx.reader();
    +          TermsEnum id = leafReader.terms("id").iterator();
    +          if (id.seekExact(new BytesRef("1"))) {
    +            PostingsEnum postings = id.postings(null, PostingsEnum.NONE);
    +            NumericDocValues numericDocValues = leafReader.getNumericDocValues("new_numeric");
    +            BinaryDocValues binaryDocValues = leafReader.getBinaryDocValues("new_binary");
    +            int doc;
    +            while ((doc = postings.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
    +              found = true;
    +              assertEquals(doc, binaryDocValues.nextDoc());
    +              assertEquals(doc, numericDocValues.nextDoc());
    +              assertEquals(1, numericDocValues.longValue());
    +              assertEquals(toBytes(1), binaryDocValues.binaryValue());
    +            }
    +          }
    --- End diff --
   
    👍


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

dnhatn
In reply to this post by dnhatn
Github user s1monw commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239409024
 
    --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java ---
    @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
         writer.close();
         dir.close();
       }
    +
    +  public void testSoftDeletes() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    int maxDoc = writer.maxDoc();
    +    writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1));
    +
    +    if (random().nextBoolean()) {
    +      writer.commit();
    +    }
    +    writer.forceMerge(1);
    +    writer.commit();
    +    assertEquals(maxDoc-1, writer.maxDoc());
    +    writer.close();
    +    dir.close();
    +  }
    +
    +  public void testDocValuesUpdatesWithNewField() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +
    +    // update fields and verify index
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random()));
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    // introduce a new field that we later update
    +    writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO),
    +        new NumericDocValuesField("new_numeric", 1),
    +        new BinaryDocValuesField("new_binary", toBytes(1))));
    +    writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
    +    writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1));
    +
    +    writer.commit();
    +    Callable assertDV = () -> {
    +      boolean found = false;
    +      try (DirectoryReader reader = DirectoryReader.open(dir)) {
    +        for (LeafReaderContext ctx : reader.leaves()) {
    +          LeafReader leafReader = ctx.reader();
    +          TermsEnum id = leafReader.terms("id").iterator();
    +          if (id.seekExact(new BytesRef("1"))) {
    +            PostingsEnum postings = id.postings(null, PostingsEnum.NONE);
    +            NumericDocValues numericDocValues = leafReader.getNumericDocValues("new_numeric");
    +            BinaryDocValues binaryDocValues = leafReader.getBinaryDocValues("new_binary");
    +            int doc;
    +            while ((doc = postings.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
    +              found = true;
    +              assertEquals(doc, binaryDocValues.nextDoc());
    +              assertEquals(doc, numericDocValues.nextDoc());
    +              assertEquals(1, numericDocValues.longValue());
    +              assertEquals(toBytes(1), binaryDocValues.binaryValue());
    +            }
    +          }
    --- End diff --
   
    well this doesn't work since we have an other doc with a value in this field I guess and it might be merged into a single segment. which we also force


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr issue #516: LUCENE-8594: DV update are broken for updates on new...

dnhatn
In reply to this post by dnhatn
Github user s1monw commented on the issue:

    https://github.com/apache/lucene-solr/pull/516
 
    @jpountz I pushed changes


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

dnhatn
In reply to this post by dnhatn
Github user mikemccand commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239426275
 
    --- Diff: lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene70/Lucene70Codec.java ---
    @@ -66,7 +67,7 @@ public PostingsFormat getPostingsFormatForField(String field) {
       private final DocValuesFormat docValuesFormat = new PerFieldDocValuesFormat() {
         @Override
         public DocValuesFormat getDocValuesFormatForField(String field) {
    -      throw new IllegalStateException("This codec should only be used for reading, not writing");
    +      return defaultDVFormat;
    --- End diff --
   
    Ahh this is because the DV updates must also be written with the old codec?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

dnhatn
In reply to this post by dnhatn
Github user jpountz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239448053
 
    --- Diff: lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene70/Lucene70Codec.java ---
    @@ -66,7 +67,7 @@ public PostingsFormat getPostingsFormatForField(String field) {
       private final DocValuesFormat docValuesFormat = new PerFieldDocValuesFormat() {
         @Override
         public DocValuesFormat getDocValuesFormatForField(String field) {
    -      throw new IllegalStateException("This codec should only be used for reading, not writing");
    +      return defaultDVFormat;
    --- End diff --
   
    I don't think it's a requirement, but I like using the old format better:
     - it's probably easier to debug, as all fields would be using the same format
     - given that PFDVFormat creates different files per format, forcing the new format would increase the number of files for old segments


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

dnhatn
In reply to this post by dnhatn
Github user asfgit closed the pull request at:

    https://github.com/apache/lucene-solr/pull/516


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]