[jira] Created: (LUCENE-991) BoostingTermQuery.explain() bugs

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-991) BoostingTermQuery.explain() bugs

JIRA jira@apache.org
BoostingTermQuery.explain() bugs
--------------------------------

                 Key: LUCENE-991
                 URL: https://issues.apache.org/jira/browse/LUCENE-991
             Project: Lucene - Java
          Issue Type: Bug
          Components: Search
    Affects Versions: 2.2
            Reporter: Peter Keegan
            Priority: Minor
         Attachments: TestBoostingTermQuery.patch

There are a couple of minor bugs in BoostingTermQuery.explain().

1. The computation of average payload score produces NaN if no payloads were found. It should probably be:
float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payloadScore / payloadsSeen) : 1);

2. If the average payload score is zero, the value of the explanation is 0:
result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
If the query is part of a BooleanClause, this results in:
"no match on required clause..."
"failure to meet condition(s) of required/prohibited clause(s)"

The average payload score can be zero if the field boost = 0.

I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-991) BoostingTermQuery.explain() bugs

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Keegan updated LUCENE-991:
--------------------------------

    Attachment: TestBoostingTermQuery.patch

Added 'testNoPayload'

> BoostingTermQuery.explain() bugs
> --------------------------------
>
>                 Key: LUCENE-991
>                 URL: https://issues.apache.org/jira/browse/LUCENE-991
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Peter Keegan
>            Priority: Minor
>         Attachments: TestBoostingTermQuery.patch
>
>
> There are a couple of minor bugs in BoostingTermQuery.explain().
> 1. The computation of average payload score produces NaN if no payloads were found. It should probably be:
> float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payloadScore / payloadsSeen) : 1);
> 2. If the average payload score is zero, the value of the explanation is 0:
> result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
> If the query is part of a BooleanClause, this results in:
> "no match on required clause..."
> "failure to meet condition(s) of required/prohibited clause(s)"
> The average payload score can be zero if the field boost = 0.
> I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (LUCENE-991) BoostingTermQuery.explain() bugs

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned LUCENE-991:
--------------------------------------

    Assignee: Grant Ingersoll

> BoostingTermQuery.explain() bugs
> --------------------------------
>
>                 Key: LUCENE-991
>                 URL: https://issues.apache.org/jira/browse/LUCENE-991
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Peter Keegan
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: TestBoostingTermQuery.patch
>
>
> There are a couple of minor bugs in BoostingTermQuery.explain().
> 1. The computation of average payload score produces NaN if no payloads were found. It should probably be:
> float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payloadScore / payloadsSeen) : 1);
> 2. If the average payload score is zero, the value of the explanation is 0:
> result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
> If the query is part of a BooleanClause, this results in:
> "no match on required clause..."
> "failure to meet condition(s) of required/prohibited clause(s)"
> The average payload score can be zero if the field boost = 0.
> I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-991) BoostingTermQuery.explain() bugs

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525695 ]

Grant Ingersoll commented on LUCENE-991:
----------------------------------------

Hi Peter,

Couple comments.  #1 makes sense, except the super.score() part, the score from the other part of the matching is handled by the nonPayloadExpl part.  I do agree it should check for zero on payloadsSeen, though, and have added that.

I don't think I am understanding the issue with #2 above.  I am not sure the test is correct.  The results[0] being passed into the checkHitCollector say you expect Document 0 to be a match, but this can't be since the boost is 0, therefore there are no results.  This can be seen by running the query against the search without the explain, as in:
TopDocs hits = searcher.search(query, null, 100);
assertTrue("hits Size: " + hits.totalHits + " is not: " + 0, hits.totalHits == 0);

Or, perhaps I am missing something?  I guess I don't see why the boost part needs to be in there?  Can't you have a test that has no payloads?


> BoostingTermQuery.explain() bugs
> --------------------------------
>
>                 Key: LUCENE-991
>                 URL: https://issues.apache.org/jira/browse/LUCENE-991
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Peter Keegan
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: TestBoostingTermQuery.patch
>
>
> There are a couple of minor bugs in BoostingTermQuery.explain().
> 1. The computation of average payload score produces NaN if no payloads were found. It should probably be:
> float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payloadScore / payloadsSeen) : 1);
> 2. If the average payload score is zero, the value of the explanation is 0:
> result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
> If the query is part of a BooleanClause, this results in:
> "no match on required clause..."
> "failure to meet condition(s) of required/prohibited clause(s)"
> The average payload score can be zero if the field boost = 0.
> I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-991) BoostingTermQuery.explain() bugs

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated LUCENE-991:
-----------------------------------

    Attachment: TestBoostingTermQuery2.patch

but, I agree, there is something wrong here.  Attached is an update of the Test, plus a fix for #1.

> BoostingTermQuery.explain() bugs
> --------------------------------
>
>                 Key: LUCENE-991
>                 URL: https://issues.apache.org/jira/browse/LUCENE-991
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Peter Keegan
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: TestBoostingTermQuery.patch, TestBoostingTermQuery2.patch
>
>
> There are a couple of minor bugs in BoostingTermQuery.explain().
> 1. The computation of average payload score produces NaN if no payloads were found. It should probably be:
> float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payloadScore / payloadsSeen) : 1);
> 2. If the average payload score is zero, the value of the explanation is 0:
> result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
> If the query is part of a BooleanClause, this results in:
> "no match on required clause..."
> "failure to meet condition(s) of required/prohibited clause(s)"
> The average payload score can be zero if the field boost = 0.
> I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-991) BoostingTermQuery.explain() bugs

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated LUCENE-991:
-----------------------------------

    Attachment: TestBoostingTermQuery3.patch

OK, I think I see the problem,

The issue lies in the fact that the Similarity override for this test sets the tf() to 1, regardless of the frequency coming in.  Thus, for the "foo" clause, it

Let me know what you think of this patch.

> BoostingTermQuery.explain() bugs
> --------------------------------
>
>                 Key: LUCENE-991
>                 URL: https://issues.apache.org/jira/browse/LUCENE-991
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Peter Keegan
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: TestBoostingTermQuery.patch, TestBoostingTermQuery2.patch, TestBoostingTermQuery3.patch
>
>
> There are a couple of minor bugs in BoostingTermQuery.explain().
> 1. The computation of average payload score produces NaN if no payloads were found. It should probably be:
> float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payloadScore / payloadsSeen) : 1);
> 2. If the average payload score is zero, the value of the explanation is 0:
> result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
> If the query is part of a BooleanClause, this results in:
> "no match on required clause..."
> "failure to meet condition(s) of required/prohibited clause(s)"
> The average payload score can be zero if the field boost = 0.
> I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-991) BoostingTermQuery.explain() bugs

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525709 ]

Peter Keegan commented on LUCENE-991:
-------------------------------------



Hi Grant,

> TopDocs hits = searcher.search(query, null, 100);
> assertTrue("hits Size: " + hits.totalHits + " is not: " + 0, hits.totalHits == 0);

TopDocCollector discards hits with score = 0, so that's not a fair comparison. If you do a similar test with TermQuery (with a field boost = 0) instead of BoostingTermQuery, you'll see the difference. Even terms with 0 weight are included in the explanation. Make sense?

Peter



> BoostingTermQuery.explain() bugs
> --------------------------------
>
>                 Key: LUCENE-991
>                 URL: https://issues.apache.org/jira/browse/LUCENE-991
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Peter Keegan
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: TestBoostingTermQuery.patch, TestBoostingTermQuery2.patch, TestBoostingTermQuery3.patch
>
>
> There are a couple of minor bugs in BoostingTermQuery.explain().
> 1. The computation of average payload score produces NaN if no payloads were found. It should probably be:
> float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payloadScore / payloadsSeen) : 1);
> 2. If the average payload score is zero, the value of the explanation is 0:
> result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
> If the query is part of a BooleanClause, this results in:
> "no match on required clause..."
> "failure to meet condition(s) of required/prohibited clause(s)"
> The average payload score can be zero if the field boost = 0.
> I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-991) BoostingTermQuery.explain() bugs

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated LUCENE-991:
-----------------------------------

    Attachment: TestBoostingTermQuery4.patch

OK, I added the setBoost(0) back in, but kept the similarity change and the test passes

> BoostingTermQuery.explain() bugs
> --------------------------------
>
>                 Key: LUCENE-991
>                 URL: https://issues.apache.org/jira/browse/LUCENE-991
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Peter Keegan
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: TestBoostingTermQuery.patch, TestBoostingTermQuery2.patch, TestBoostingTermQuery3.patch, TestBoostingTermQuery4.patch
>
>
> There are a couple of minor bugs in BoostingTermQuery.explain().
> 1. The computation of average payload score produces NaN if no payloads were found. It should probably be:
> float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payloadScore / payloadsSeen) : 1);
> 2. If the average payload score is zero, the value of the explanation is 0:
> result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
> If the query is part of a BooleanClause, this results in:
> "no match on required clause..."
> "failure to meet condition(s) of required/prohibited clause(s)"
> The average payload score can be zero if the field boost = 0.
> I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-991) BoostingTermQuery.explain() bugs

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525734 ]

Peter Keegan commented on LUCENE-991:
-------------------------------------

Confirmed - thanks.

> BoostingTermQuery.explain() bugs
> --------------------------------
>
>                 Key: LUCENE-991
>                 URL: https://issues.apache.org/jira/browse/LUCENE-991
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Peter Keegan
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: TestBoostingTermQuery.patch, TestBoostingTermQuery2.patch, TestBoostingTermQuery3.patch, TestBoostingTermQuery4.patch
>
>
> There are a couple of minor bugs in BoostingTermQuery.explain().
> 1. The computation of average payload score produces NaN if no payloads were found. It should probably be:
> float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payloadScore / payloadsSeen) : 1);
> 2. If the average payload score is zero, the value of the explanation is 0:
> result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
> If the query is part of a BooleanClause, this results in:
> "no match on required clause..."
> "failure to meet condition(s) of required/prohibited clause(s)"
> The average payload score can be zero if the field boost = 0.
> I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-991) BoostingTermQuery.explain() bugs

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll resolved LUCENE-991.
------------------------------------

       Resolution: Fixed
    Lucene Fields: [Patch Available]  (was: [Patch Available, New])

Committed.

> BoostingTermQuery.explain() bugs
> --------------------------------
>
>                 Key: LUCENE-991
>                 URL: https://issues.apache.org/jira/browse/LUCENE-991
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Peter Keegan
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: TestBoostingTermQuery.patch, TestBoostingTermQuery2.patch, TestBoostingTermQuery3.patch, TestBoostingTermQuery4.patch
>
>
> There are a couple of minor bugs in BoostingTermQuery.explain().
> 1. The computation of average payload score produces NaN if no payloads were found. It should probably be:
> float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payloadScore / payloadsSeen) : 1);
> 2. If the average payload score is zero, the value of the explanation is 0:
> result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
> If the query is part of a BooleanClause, this results in:
> "no match on required clause..."
> "failure to meet condition(s) of required/prohibited clause(s)"
> The average payload score can be zero if the field boost = 0.
> I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]