[jira] Created: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

Prajeeth Emanuel (Jira)
FieldCache.getStringIndex should not throw exception if term count exceeds doc count
------------------------------------------------------------------------------------

                 Key: LUCENE-2142
                 URL: https://issues.apache.org/jira/browse/LUCENE-2142
             Project: Lucene - Java
          Issue Type: Bug
          Components: Search
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Minor
             Fix For: 3.1


Spinoff of LUCENE-2133/LUCENE-831.

Currently FieldCache cannot handle more than one value per field.
We may someday want to fix that... but until that day:

FieldCache.getStringIndex currently does a simplistic check to try to
catch when you've accidentally allowed more than one term per field,
by testing if the number of unique terms exceeds the number of
documents.

The problem is, this is not a perfect check, in that it allows false
negatives (you could have more than one term per field for some docs
and the check won't catch you).

Further, the exception thrown is the unchecked RuntimeException.

So this means... you could happily think all is good, until some day,
well into production, once you've updated enough docs, suddenly the
check will catch you and throw an unhandled exception, stopping all
searches [that need to sort by this string field] in their tracks.
It's not gracefully degrading.

I think we should simply remove the test, ie, if you have more terms
than docs then the terms simply overwrite one another.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

Prajeeth Emanuel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788925#action_12788925 ]

Yonik Seeley commented on LUCENE-2142:
--------------------------------------

+1

> FieldCache.getStringIndex should not throw exception if term count exceeds doc count
> ------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2142
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2142
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>
> Spinoff of LUCENE-2133/LUCENE-831.
> Currently FieldCache cannot handle more than one value per field.
> We may someday want to fix that... but until that day:
> FieldCache.getStringIndex currently does a simplistic check to try to
> catch when you've accidentally allowed more than one term per field,
> by testing if the number of unique terms exceeds the number of
> documents.
> The problem is, this is not a perfect check, in that it allows false
> negatives (you could have more than one term per field for some docs
> and the check won't catch you).
> Further, the exception thrown is the unchecked RuntimeException.
> So this means... you could happily think all is good, until some day,
> well into production, once you've updated enough docs, suddenly the
> check will catch you and throw an unhandled exception, stopping all
> searches [that need to sort by this string field] in their tracks.
> It's not gracefully degrading.
> I think we should simply remove the test, ie, if you have more terms
> than docs then the terms simply overwrite one another.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

Prajeeth Emanuel (Jira)
In reply to this post by Prajeeth Emanuel (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-2142.
----------------------------------------

    Resolution: Fixed

> FieldCache.getStringIndex should not throw exception if term count exceeds doc count
> ------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2142
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2142
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>
> Spinoff of LUCENE-2133/LUCENE-831.
> Currently FieldCache cannot handle more than one value per field.
> We may someday want to fix that... but until that day:
> FieldCache.getStringIndex currently does a simplistic check to try to
> catch when you've accidentally allowed more than one term per field,
> by testing if the number of unique terms exceeds the number of
> documents.
> The problem is, this is not a perfect check, in that it allows false
> negatives (you could have more than one term per field for some docs
> and the check won't catch you).
> Further, the exception thrown is the unchecked RuntimeException.
> So this means... you could happily think all is good, until some day,
> well into production, once you've updated enough docs, suddenly the
> check will catch you and throw an unhandled exception, stopping all
> searches [that need to sort by this string field] in their tracks.
> It's not gracefully degrading.
> I think we should simply remove the test, ie, if you have more terms
> than docs then the terms simply overwrite one another.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

Prajeeth Emanuel (Jira)
In reply to this post by Prajeeth Emanuel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789474#action_12789474 ]

Earwin Burrfoot commented on LUCENE-2142:
-----------------------------------------

+1

> FieldCache.getStringIndex should not throw exception if term count exceeds doc count
> ------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2142
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2142
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>
> Spinoff of LUCENE-2133/LUCENE-831.
> Currently FieldCache cannot handle more than one value per field.
> We may someday want to fix that... but until that day:
> FieldCache.getStringIndex currently does a simplistic check to try to
> catch when you've accidentally allowed more than one term per field,
> by testing if the number of unique terms exceeds the number of
> documents.
> The problem is, this is not a perfect check, in that it allows false
> negatives (you could have more than one term per field for some docs
> and the check won't catch you).
> Further, the exception thrown is the unchecked RuntimeException.
> So this means... you could happily think all is good, until some day,
> well into production, once you've updated enough docs, suddenly the
> check will catch you and throw an unhandled exception, stopping all
> searches [that need to sort by this string field] in their tracks.
> It's not gracefully degrading.
> I think we should simply remove the test, ie, if you have more terms
> than docs then the terms simply overwrite one another.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]