[jira] [Created] (LUCENE-4248) Producers to the Codec API don't always follow the spec

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (LUCENE-4248) Producers to the Codec API don't always follow the spec

JIRA jira@apache.org
Robert Muir created LUCENE-4248:
-----------------------------------

             Summary: Producers to the Codec API don't always follow the spec
                 Key: LUCENE-4248
                 URL: https://issues.apache.org/jira/browse/LUCENE-4248
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Robert Muir
         Attachments: LUCENE-4248.patch

We added AssertingCodec etc and have lots of tests that consumers of the codec api follow a strict set of rules: but nothing checks the producers feeding these apis (IndexWriter, codec merge implementations, etc).

We should beef up AssertingCodec to validate these things: this way the API is being followed.

Simple examples of things include checking that producers are feeding terms to the consumers consistent with their comparator, that they aren't providing bogus or out of band statistics, and that they are invoking the right methods consistently (e.g. not forgetting to call finishDoc or something that might confuse someones codec).

This is also nice since now we have quite a few tests (TestCodecs, TestPostingsFormat, etc) that feed these APIs directly, it could find some test bugs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LUCENE-4248) Producers to the Codec API don't always follow the spec

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-4248:
--------------------------------

    Attachment: LUCENE-4248.patch

the start to a patch: some tests still fail.

I figure we can get everything cleaned up for postings and then if we feel like it later, add stuff for the other parts of the codec API.
               

> Producers to the Codec API don't always follow the spec
> -------------------------------------------------------
>
>                 Key: LUCENE-4248
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4248
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-4248.patch
>
>
> We added AssertingCodec etc and have lots of tests that consumers of the codec api follow a strict set of rules: but nothing checks the producers feeding these apis (IndexWriter, codec merge implementations, etc).
> We should beef up AssertingCodec to validate these things: this way the API is being followed.
> Simple examples of things include checking that producers are feeding terms to the consumers consistent with their comparator, that they aren't providing bogus or out of band statistics, and that they are invoking the right methods consistently (e.g. not forgetting to call finishDoc or something that might confuse someones codec).
> This is also nice since now we have quite a few tests (TestCodecs, TestPostingsFormat, etc) that feed these APIs directly, it could find some test bugs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LUCENE-4248) Producers to the Codec API don't always follow the spec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-4248:
--------------------------------

    Attachment: LUCENE-4248.patch

Updated patch: fixing some more bugs in these producers.

I added a simple state machine as well, but because of the "startTerm without corresponding finishTerm is allowed if all docs are deleted for that term", the check is not that great right now.

Once we add an AssertingPostingsConsumer of some sort we can actually validate no docs were added in that case and i think it will be fine...

But I'd like to commit this for now as a start.
               

> Producers to the Codec API don't always follow the spec
> -------------------------------------------------------
>
>                 Key: LUCENE-4248
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4248
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-4248.patch, LUCENE-4248.patch
>
>
> We added AssertingCodec etc and have lots of tests that consumers of the codec api follow a strict set of rules: but nothing checks the producers feeding these apis (IndexWriter, codec merge implementations, etc).
> We should beef up AssertingCodec to validate these things: this way the API is being followed.
> Simple examples of things include checking that producers are feeding terms to the consumers consistent with their comparator, that they aren't providing bogus or out of band statistics, and that they are invoking the right methods consistently (e.g. not forgetting to call finishDoc or something that might confuse someones codec).
> This is also nice since now we have quite a few tests (TestCodecs, TestPostingsFormat, etc) that feed these APIs directly, it could find some test bugs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LUCENE-4248) Producers to the Codec API don't always follow the spec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-4248:
--------------------------------

    Attachment: LUCENE-4248.patch

one more check, and also fix a bad assert in BlockTree writer
               

> Producers to the Codec API don't always follow the spec
> -------------------------------------------------------
>
>                 Key: LUCENE-4248
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4248
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-4248.patch, LUCENE-4248.patch, LUCENE-4248.patch
>
>
> We added AssertingCodec etc and have lots of tests that consumers of the codec api follow a strict set of rules: but nothing checks the producers feeding these apis (IndexWriter, codec merge implementations, etc).
> We should beef up AssertingCodec to validate these things: this way the API is being followed.
> Simple examples of things include checking that producers are feeding terms to the consumers consistent with their comparator, that they aren't providing bogus or out of band statistics, and that they are invoking the right methods consistently (e.g. not forgetting to call finishDoc or something that might confuse someones codec).
> This is also nice since now we have quite a few tests (TestCodecs, TestPostingsFormat, etc) that feed these APIs directly, it could find some test bugs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LUCENE-4248) Producers to the Codec API don't always follow the spec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-4248:
--------------------------------

    Attachment: LUCENE-4248.patch

Here's a patch for the rest of the postings API.

FreqProxTermsWriter was inconsistent here (depending upon when the omitTF bit got set in the indexing process).

I added javadocs for these apis to clarify these things (freq, offsets, etc) are all -1 when they are not being indexed.

TestCodecs didnt call finishDoc()... other than that things look good.
               

> Producers to the Codec API don't always follow the spec
> -------------------------------------------------------
>
>                 Key: LUCENE-4248
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4248
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-4248.patch, LUCENE-4248.patch, LUCENE-4248.patch, LUCENE-4248.patch
>
>
> We added AssertingCodec etc and have lots of tests that consumers of the codec api follow a strict set of rules: but nothing checks the producers feeding these apis (IndexWriter, codec merge implementations, etc).
> We should beef up AssertingCodec to validate these things: this way the API is being followed.
> Simple examples of things include checking that producers are feeding terms to the consumers consistent with their comparator, that they aren't providing bogus or out of band statistics, and that they are invoking the right methods consistently (e.g. not forgetting to call finishDoc or something that might confuse someones codec).
> This is also nice since now we have quite a few tests (TestCodecs, TestPostingsFormat, etc) that feed these APIs directly, it could find some test bugs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (LUCENE-4248) Producers to the Codec API don't always follow the spec

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-4248.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 5.0
                   4.0
   

> Producers to the Codec API don't always follow the spec
> -------------------------------------------------------
>
>                 Key: LUCENE-4248
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4248
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>             Fix For: 4.0, 5.0
>
>         Attachments: LUCENE-4248.patch, LUCENE-4248.patch, LUCENE-4248.patch, LUCENE-4248.patch
>
>
> We added AssertingCodec etc and have lots of tests that consumers of the codec api follow a strict set of rules: but nothing checks the producers feeding these apis (IndexWriter, codec merge implementations, etc).
> We should beef up AssertingCodec to validate these things: this way the API is being followed.
> Simple examples of things include checking that producers are feeding terms to the consumers consistent with their comparator, that they aren't providing bogus or out of band statistics, and that they are invoking the right methods consistently (e.g. not forgetting to call finishDoc or something that might confuse someones codec).
> This is also nice since now we have quite a few tests (TestCodecs, TestPostingsFormat, etc) that feed these APIs directly, it could find some test bugs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]