[jira] Created: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

ASF GitHub Bot (Jira)
QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
-------------------------------------------------------------------------------------------

                 Key: LUCENE-933
                 URL: https://issues.apache.org/jira/browse/LUCENE-933
             Project: Lucene - Java
          Issue Type: Bug
            Reporter: Hoss Man


as triggered by SOLR-261, if you have a query like this...

   +foo:BBB  +(yak:AAA  baz:CCC)

...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" portions of the query (posisbly because they are stop words) the resulting query produced by the QueryParser will be...

  +foo:BBB +()

...that is a BooleanQuery with two required clauses, one of which is an empty BooleanQuery with no clauses.

this does not appear to be "good" behavior.

In general, QueryParser should be smarter about what it does when parsing encountering parens whose contents result in an empty BooleanQuery -- but what exactly it should do in the following situations...

 a)  +foo:BBB +()
 b)  +foo:BBB ()
 c)  +foo:BBB -()

...is up for interpretation.  I would think situation (b) clearly lends itself to dropping the sub-BooleanQuery completely.  situation (c) may also lend itself to that solution, since semanticly it means "don't allow a match on any queries in the empty set of queries".  .... I have no idea what the "right" thing to do for situation (a) is.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506342 ]

Doron Cohen commented on LUCENE-933:
------------------------------------

>  a) +foo:BBB +()
>  I have no idea what the "right" thing to do for situation (a) is.

Interestingly, see TestQueryParser.testQPA():
      assertQueryEquals("term +stop term", qpAnalyzer, "term term");
      assertQueryEquals("term -stop term", qpAnalyzer, "term term");

So today already requiring word W to not/appear become a non-requirement in case W is a stopword.

Currently adding  any of these would cause failure:
    assertQueryEquals("term +(stop) term", qpAnalyzer, "term term");
    assertQueryEquals("term -(stop) term", qpAnalyzer, "term term");
    assertQueryEquals("term +(stop stop) term", qpAnalyzer, "term term");
    assertQueryEquals("term -(stop stop) term", qpAnalyzer, "term term");

I feel comfortable with applying the logic we have for a single (stop)word on a group of (stop)words, i.e. making the added lines pass.

Interestingly, consider this query:
      A  B +(+C -C)
Regularly it would have no match, because  
     X AND NOT X == FALSE
but if C is a stopword, with the fixed(?) logic the query would become:
     A  B
and might have matches.
Now is that a glitch? I'd like to think not.

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-933
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Hoss Man
>
> as triggered by SOLR-261, if you have a query like this...
>    +foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" portions of the query (posisbly because they are stop words) the resulting query produced by the QueryParser will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing encountering parens whose contents result in an empty BooleanQuery -- but what exactly it should do in the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends itself to dropping the sub-BooleanQuery completely.  situation (c) may also lend itself to that solution, since semanticly it means "don't allow a match on any queries in the empty set of queries".  .... I have no idea what the "right" thing to do for situation (a) is.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506351 ]

Hoss Man commented on LUCENE-933:
---------------------------------

> I feel comfortable with applying the logic we have for a single (stop)word on a group of
> (stop)words, i.e. making the added lines pass.

+1

> Interestingly, consider this query:
>       A  B +(+C -C)

perhaps an alternate way to view this problem would be to ask:  what should QueryParser do, if asked to parse this string...
        A B +()

...if the answer is "treat it like 'A B'" then i think we're okay with the approach you described above.  if the answer is "an empty query doesn't match anything, so requiring a match on a clause which is an empty query should result in the outer query matching nothing"  then we've got a problem ... mainly that it contradicts the example you cited from TestQueryParser.testQPA() if you replace "an empty query" in the previous statement with "a query on a stop word"

personally, i think it's okay to say "A  B +(+C -C)" == "A B" if the analyzer doesn't produce any tokens for C.

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-933
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Hoss Man
>
> as triggered by SOLR-261, if you have a query like this...
>    +foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" portions of the query (posisbly because they are stop words) the resulting query produced by the QueryParser will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing encountering parens whose contents result in an empty BooleanQuery -- but what exactly it should do in the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends itself to dropping the sub-BooleanQuery completely.  situation (c) may also lend itself to that solution, since semanticly it means "don't allow a match on any queries in the empty set of queries".  .... I have no idea what the "right" thing to do for situation (a) is.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen reassigned LUCENE-933:
----------------------------------

    Assignee: Doron Cohen

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-933
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Assignee: Doron Cohen
>
> as triggered by SOLR-261, if you have a query like this...
>    +foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" portions of the query (posisbly because they are stop words) the resulting query produced by the QueryParser will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing encountering parens whose contents result in an empty BooleanQuery -- but what exactly it should do in the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends itself to dropping the sub-BooleanQuery completely.  situation (c) may also lend itself to that solution, since semanticly it means "don't allow a match on any queries in the empty set of queries".  .... I have no idea what the "right" thing to do for situation (a) is.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506703 ]

Doron Cohen commented on LUCENE-933:
------------------------------------

So an acceptable solution is:
  Query parser will ignore empty clauses (e.g. ' ( ) ' ) resulted from words filtering, the same as it already does for single words.

A straightforward fix is for QueryParser to avoid adding null (inner) queries into (outer) clauses sets. (It makes sense, too.)

However this has a side effect:
  For queries that became "empty" as result of filtering (stopping), QueryParser would now return null.

This is an API semantics change, because applications that used to get a BooleanQuery with 0 clauses as parse result, would now get a null query.

Here is a closer look on the behavior change:

Original behavior:
   (1)  parse(" ")  == ParseException
   (2)  parse("( )")  == ParseException
   (3)  parse("stop") == " "    
        (actually a boolean query with 0 clauses)
   (4)  parse("(stop)")  == " "    
        (actually a boolean query with 0 clauses)
   (5)  parse("a stop b") == "a b"
   (6)  parse("a (stop) b") == "a () b"  
        (middle part is a boolean query with 0 clauses)
   (7)  parse("a ((stop)) b") == "a () b"
        (again middle part is a boolean query with 0 clauses)

Modified behavior:  
   (3)  parse("stop") == null
   (4)  parse("(stop)")  == null    
   (6)  parse("a (stop) b") == "a b"  
   (7)  parse("a ((stop)) b") == "a b"

I think the modified behavior is the right one - applications can test a query for being null and realize that it is a no-op.

However backwards compatibility is important - would this change break existing applications with annoying new NPEs?

As an alternative, QueryParser parse() methods can be modified to return a phony empty BQ instead of returning null, for the sake of backwards compatibility.

Thoughts?

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-933
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Assignee: Doron Cohen
>
> as triggered by SOLR-261, if you have a query like this...
>    +foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" portions of the query (posisbly because they are stop words) the resulting query produced by the QueryParser will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing encountering parens whose contents result in an empty BooleanQuery -- but what exactly it should do in the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends itself to dropping the sub-BooleanQuery completely.  situation (c) may also lend itself to that solution, since semanticly it means "don't allow a match on any queries in the empty set of queries".  .... I have no idea what the "right" thing to do for situation (a) is.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-933:
-------------------------------

    Attachment: lucene-933_nullify.patch
                lucene-933_backwards_comapatible.patch

Ok attaching two different fixes (as discussed above)
  (1)  lucene-933_backwards_comapatible.patch
  (2)  lucene-933_nullify.patch

All tests pass with either of these.

The "nullify" approach requires more changes especially tests as well as in MemoryIndex, so, after while fixing as required for tests to pass in this (nullifying) approach I cane to conclusion that it is better to continue to not return null queries as result of parsing, otherwise there'll be lots of "noise".

So I would like to commit patch (1) - unless someone points a problem that I missed.

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-933
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Assignee: Doron Cohen
>         Attachments: lucene-933_backwards_comapatible.patch, lucene-933_nullify.patch
>
>
> as triggered by SOLR-261, if you have a query like this...
>    +foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" portions of the query (posisbly because they are stop words) the resulting query produced by the QueryParser will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing encountering parens whose contents result in an empty BooleanQuery -- but what exactly it should do in the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends itself to dropping the sub-BooleanQuery completely.  situation (c) may also lend itself to that solution, since semanticly it means "don't allow a match on any queries in the empty set of queries".  .... I have no idea what the "right" thing to do for situation (a) is.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen resolved LUCENE-933.
--------------------------------

       Resolution: Fixed
    Lucene Fields: [Patch Available]  (was: [New])

committed the bakwards-compatible patch (parsed query is not null).

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-933
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Assignee: Doron Cohen
>         Attachments: lucene-933_backwards_comapatible.patch, lucene-933_nullify.patch
>
>
> as triggered by SOLR-261, if you have a query like this...
>    +foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" portions of the query (posisbly because they are stop words) the resulting query produced by the QueryParser will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing encountering parens whose contents result in an empty BooleanQuery -- but what exactly it should do in the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends itself to dropping the sub-BooleanQuery completely.  situation (c) may also lend itself to that solution, since semanticly it means "don't allow a match on any queries in the empty set of queries".  .... I have no idea what the "right" thing to do for situation (a) is.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508054 ]

Hoss Man commented on LUCENE-933:
---------------------------------

woops ... sorry doron, i actually reviewed these patches the other day, but aparently i got side tracked and never commented.

i think you made the right choice with the backwards_comapatible.patch

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-933
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Assignee: Doron Cohen
>         Attachments: lucene-933_backwards_comapatible.patch, lucene-933_nullify.patch
>
>
> as triggered by SOLR-261, if you have a query like this...
>    +foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" portions of the query (posisbly because they are stop words) the resulting query produced by the QueryParser will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing encountering parens whose contents result in an empty BooleanQuery -- but what exactly it should do in the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends itself to dropping the sub-BooleanQuery completely.  situation (c) may also lend itself to that solution, since semanticly it means "don't allow a match on any queries in the empty set of queries".  .... I have no idea what the "right" thing to do for situation (a) is.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508058 ]

Doron Cohen commented on LUCENE-933:
------------------------------------

great, thanks Hoss!

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-933
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Assignee: Doron Cohen
>         Attachments: lucene-933_backwards_comapatible.patch, lucene-933_nullify.patch
>
>
> as triggered by SOLR-261, if you have a query like this...
>    +foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" portions of the query (posisbly because they are stop words) the resulting query produced by the QueryParser will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty BooleanQuery with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing encountering parens whose contents result in an empty BooleanQuery -- but what exactly it should do in the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends itself to dropping the sub-BooleanQuery completely.  situation (c) may also lend itself to that solution, since semanticly it means "don't allow a match on any queries in the empty set of queries".  .... I have no idea what the "right" thing to do for situation (a) is.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]