ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

Shifflett, David [USA]
Hi all,
Using the code snippet:
    ComplexPhraseQueryParser qp = new ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer());
    String teststr = "\"Foo Bar\"~2";
    Query queryToSearch = qp.parse(teststr);
    System.out.println("Query : " + queryToSearch.toString());
    System.out.println("Type of query : " + queryToSearch.getClass().getSimpleName());

I am getting the output
    Query : "Foo Bar"~2
    Type of query : ComplexPhraseQuery

If I change teststr to "\"Foo Bar\""
I get
    Query : "Foo Bar"
    Type of query : ComplexPhraseQuery

If I change teststr to "Foo Bar"
I get
    Query : content:foo content:bar
    Type of query : BooleanQuery


In the first two cases I was expecting the search terms to be switched to lowercase.

Were the Foo and Bar left as originally specified because the terms are inside double quotes?

How can I specify a search term that I want treated as a Phrase,
but also have the query parser apply the LowerCaseFilter?

I am hoping to avoid the need to handle this using PhraseQuery,
and continue to use the QueryParser.


Thanks in advance for any help you can give me,
David Shifflett

Reply | Threaded
Open this post in threaded view
|

Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

baris.kazar
David,-

  which version of Lucene are You using?

Best regards


On 10/21/19 1:31 PM, Shifflett, David [USA] wrote:

> Hi all,
> Using the code snippet:
>      ComplexPhraseQueryParser qp = new ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer());
>      String teststr = "\"Foo Bar\"~2";
>      Query queryToSearch = qp.parse(teststr);
>      System.out.println("Query : " + queryToSearch.toString());
>      System.out.println("Type of query : " + queryToSearch.getClass().getSimpleName());
>
> I am getting the output
>      Query : "Foo Bar"~2
>      Type of query : ComplexPhraseQuery
>
> If I change teststr to "\"Foo Bar\""
> I get
>      Query : "Foo Bar"
>      Type of query : ComplexPhraseQuery
>
> If I change teststr to "Foo Bar"
> I get
>      Query : content:foo content:bar
>      Type of query : BooleanQuery
>
>
> In the first two cases I was expecting the search terms to be switched to lowercase.
>
> Were the Foo and Bar left as originally specified because the terms are inside double quotes?
>
> How can I specify a search term that I want treated as a Phrase,
> but also have the query parser apply the LowerCaseFilter?
>
> I am hoping to avoid the need to handle this using PhraseQuery,
> and continue to use the QueryParser.
>
>
> Thanks in advance for any help you can give me,
> David Shifflett
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

Shifflett, David [USA]
Baris,

Sorry I neglected to add that piece.
This test was run against 8.0.0,
but I also want it to work in later versions.

Another piece of my project is using 8.2.0.

Thanks again for any info,
David Shifflett


On 10/21/19, 3:23 PM, "[hidden email]" <[hidden email]> wrote:

    David,-
   
      which version of Lucene are You using?
   
    Best regards
   
   
    On 10/21/19 1:31 PM, Shifflett, David [USA] wrote:
    > Hi all,
    > Using the code snippet:
    >      ComplexPhraseQueryParser qp = new ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer());
    >      String teststr = "\"Foo Bar\"~2";
    >      Query queryToSearch = qp.parse(teststr);
    >      System.out.println("Query : " + queryToSearch.toString());
    >      System.out.println("Type of query : " + queryToSearch.getClass().getSimpleName());
    >
    > I am getting the output
    >      Query : "Foo Bar"~2
    >      Type of query : ComplexPhraseQuery
    >
    > If I change teststr to "\"Foo Bar\""
    > I get
    >      Query : "Foo Bar"
    >      Type of query : ComplexPhraseQuery
    >
    > If I change teststr to "Foo Bar"
    > I get
    >      Query : content:foo content:bar
    >      Type of query : BooleanQuery
    >
    >
    > In the first two cases I was expecting the search terms to be switched to lowercase.
    >
    > Were the Foo and Bar left as originally specified because the terms are inside double quotes?
    >
    > How can I specify a search term that I want treated as a Phrase,
    > but also have the query parser apply the LowerCaseFilter?
    >
    > I am hoping to avoid the need to handle this using PhraseQuery,
    > and continue to use the QueryParser.
    >
    >
    > Thanks in advance for any help you can give me,
    > David Shifflett
    >
   
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [hidden email]
    For additional commands, e-mail: [hidden email]
   
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

baris.kazar
I wonder if this repeats in version 7.7.2, too?

Best regards


On 10/21/19 5:22 PM, Shifflett, David [USA] wrote:

> Baris,
>
> Sorry I neglected to add that piece.
> This test was run against 8.0.0,
> but I also want it to work in later versions.
>
> Another piece of my project is using 8.2.0.
>
> Thanks again for any info,
> David Shifflett
>
>
> On 10/21/19, 3:23 PM, "[hidden email]" <[hidden email]> wrote:
>
>      David,-
>      
>        which version of Lucene are You using?
>      
>      Best regards
>      
>      
>      On 10/21/19 1:31 PM, Shifflett, David [USA] wrote:
>      > Hi all,
>      > Using the code snippet:
>      >      ComplexPhraseQueryParser qp = new ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer());
>      >      String teststr = "\"Foo Bar\"~2";
>      >      Query queryToSearch = qp.parse(teststr);
>      >      System.out.println("Query : " + queryToSearch.toString());
>      >      System.out.println("Type of query : " + queryToSearch.getClass().getSimpleName());
>      >
>      > I am getting the output
>      >      Query : "Foo Bar"~2
>      >      Type of query : ComplexPhraseQuery
>      >
>      > If I change teststr to "\"Foo Bar\""
>      > I get
>      >      Query : "Foo Bar"
>      >      Type of query : ComplexPhraseQuery
>      >
>      > If I change teststr to "Foo Bar"
>      > I get
>      >      Query : content:foo content:bar
>      >      Type of query : BooleanQuery
>      >
>      >
>      > In the first two cases I was expecting the search terms to be switched to lowercase.
>      >
>      > Were the Foo and Bar left as originally specified because the terms are inside double quotes?
>      >
>      > How can I specify a search term that I want treated as a Phrase,
>      > but also have the query parser apply the LowerCaseFilter?
>      >
>      > I am hoping to avoid the need to handle this using PhraseQuery,
>      > and continue to use the QueryParser.
>      >
>      >
>      > Thanks in advance for any help you can give me,
>      > David Shifflett
>      >
>      
>      ---------------------------------------------------------------------
>      To unsubscribe, e-mail: [hidden email]
>      For additional commands, e-mail: [hidden email]
>      
>      
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

Mikhail Khludnev-2
In reply to this post by Shifflett, David [USA]
Hello,
I wonder how it come up with this particular field :
content:foo
Anyway I added some uppercase in the test and it passed despite of it

diff --git
a/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
b/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
index 5935da9..9baa492 100644
---
a/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
+++
b/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
@@ -55,8 +55,8 @@
   boolean inOrder = true;

   public void testComplexPhrases() throws Exception {
-    checkMatches("\"john smith\"", "1"); // Simple multi-term still works
-    checkMatches("\"j*   smyth~\"", "1,2"); // wildcards and fuzzies are
OK in
+    checkMatches("\"John Smith\"", "1"); // Simple multi-term still works
+    checkMatches("\"J*   Smyth~\"", "1,2"); // wildcards and fuzzies are
OK in
     // phrases
     checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
     checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
@@ -161,11 +161,11 @@
     checkMatches("name:\"j*   smyth~\"", "1,2");
     checkMatches("role:\"developer\"", "1,2");
     checkMatches("role:\"p* manager\"", "4");
-    checkMatches("role:de*", "1,2,3");
+    checkMatches("role:De*", "1,2,3");
     checkMatches("name:\"j* smyth~\"~5", "1,2,3");
     checkMatches("role:\"p* manager\" AND name:jack*", "4");
     checkMatches("+role:developer +name:jack*", "");
-    checkMatches("name:\"john smith\"~2 AND role:designer AND id:3", "3");
+    checkMatches("name:\"john smith\"~2 AND role:Designer AND id:3", "3");
   }

   public void testToStringContainsSlop() throws Exception {

Problem seems a way odd (assuming CPQP does analysis), it seems like
debugging is the last resort in this particular case.

On Mon, Oct 21, 2019 at 8:31 PM Shifflett, David [USA] <
[hidden email]> wrote:

> Hi all,
> Using the code snippet:
>     ComplexPhraseQueryParser qp = new
> ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer());
>     String teststr = "\"Foo Bar\"~2";
>     Query queryToSearch = qp.parse(teststr);
>     System.out.println("Query : " + queryToSearch.toString());
>     System.out.println("Type of query : " +
> queryToSearch.getClass().getSimpleName());
>
> I am getting the output
>     Query : "Foo Bar"~2
>     Type of query : ComplexPhraseQuery
>
> If I change teststr to "\"Foo Bar\""
> I get
>     Query : "Foo Bar"
>     Type of query : ComplexPhraseQuery
>
> If I change teststr to "Foo Bar"
> I get
>     Query : content:foo content:bar
>     Type of query : BooleanQuery
>
>
> In the first two cases I was expecting the search terms to be switched to
> lowercase.
>
> Were the Foo and Bar left as originally specified because the terms are
> inside double quotes?
>
> How can I specify a search term that I want treated as a Phrase,
> but also have the query parser apply the LowerCaseFilter?
>
> I am hoping to avoid the need to handle this using PhraseQuery,
> and continue to use the QueryParser.
>
>
> Thanks in advance for any help you can give me,
> David Shifflett
>
>

--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

Shifflett, David [USA]
Mikhail,

Thanks for running those tests.
I haven’t looked into the test, but can you confirm it uses an analyzer with the lowercase filter?
Also can you confirm whether the actual query being used contains upper or lower case J and S (in you John Smith case)

Apologizes on the 'content:foo'.
I changed the code snippet to "somefield", and missed changing that part of the output

David Shifflett


On 10/22/19, 5:51 AM, "Mikhail Khludnev" <[hidden email]> wrote:

    Hello,
    I wonder how it come up with this particular field :
    content:foo
    Anyway I added some uppercase in the test and it passed despite of it
   
    diff --git
    a/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    b/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    index 5935da9..9baa492 100644
    ---
    a/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    +++
    b/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    @@ -55,8 +55,8 @@
       boolean inOrder = true;
   
       public void testComplexPhrases() throws Exception {
    -    checkMatches("\"john smith\"", "1"); // Simple multi-term still works
    -    checkMatches("\"j*   smyth~\"", "1,2"); // wildcards and fuzzies are
    OK in
    +    checkMatches("\"John Smith\"", "1"); // Simple multi-term still works
    +    checkMatches("\"J*   Smyth~\"", "1,2"); // wildcards and fuzzies are
    OK in
         // phrases
         checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
         checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
    @@ -161,11 +161,11 @@
         checkMatches("name:\"j*   smyth~\"", "1,2");
         checkMatches("role:\"developer\"", "1,2");
         checkMatches("role:\"p* manager\"", "4");
    -    checkMatches("role:de*", "1,2,3");
    +    checkMatches("role:De*", "1,2,3");
         checkMatches("name:\"j* smyth~\"~5", "1,2,3");
         checkMatches("role:\"p* manager\" AND name:jack*", "4");
         checkMatches("+role:developer +name:jack*", "");
    -    checkMatches("name:\"john smith\"~2 AND role:designer AND id:3", "3");
    +    checkMatches("name:\"john smith\"~2 AND role:Designer AND id:3", "3");
       }
   
       public void testToStringContainsSlop() throws Exception {
   
    Problem seems a way odd (assuming CPQP does analysis), it seems like
    debugging is the last resort in this particular case.
   
    On Mon, Oct 21, 2019 at 8:31 PM Shifflett, David [USA] <
    [hidden email]> wrote:
   
    > Hi all,
    > Using the code snippet:
    >     ComplexPhraseQueryParser qp = new
    > ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer());
    >     String teststr = "\"Foo Bar\"~2";
    >     Query queryToSearch = qp.parse(teststr);
    >     System.out.println("Query : " + queryToSearch.toString());
    >     System.out.println("Type of query : " +
    > queryToSearch.getClass().getSimpleName());
    >
    > I am getting the output
    >     Query : "Foo Bar"~2
    >     Type of query : ComplexPhraseQuery
    >
    > If I change teststr to "\"Foo Bar\""
    > I get
    >     Query : "Foo Bar"
    >     Type of query : ComplexPhraseQuery
    >
    > If I change teststr to "Foo Bar"
    > I get
    >     Query : content:foo content:bar
    >     Type of query : BooleanQuery
    >
    >
    > In the first two cases I was expecting the search terms to be switched to
    > lowercase.
    >
    > Were the Foo and Bar left as originally specified because the terms are
    > inside double quotes?
    >
    > How can I specify a search term that I want treated as a Phrase,
    > but also have the query parser apply the LowerCaseFilter?
    >
    > I am hoping to avoid the need to handle this using PhraseQuery,
    > and continue to use the QueryParser.
    >
    >
    > Thanks in advance for any help you can give me,
    > David Shifflett
    >
    >
   
    --
    Sincerely yours
    Mikhail Khludnev
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

Mikhail Khludnev-2
On Tue, Oct 22, 2019 at 5:26 PM Shifflett, David [USA] <
[hidden email]> wrote:

> Mikhail,
>
> Thanks for running those tests.
> I haven’t looked into the test, but can you confirm it uses an analyzer
> with the lowercase filter?
>
Look at his diff. It's a diff on test not a test

-    checkMatches("\"john smith\"", "1"); // Simple multi-term still works
-    checkMatches("\"j*   smyth~\"", "1,2"); // wildcards and fuzzies are
OK in
+    checkMatches("\"John Smith\"", "1"); // Simple multi-term still works
+    checkMatches("\"J*   Smyth~\"", "1,2"); // wildcards and fuzzies are
OK in

Here I flip to Capital letters, and it still matches what it matches before
in lower.


> Also can you confirm whether the actual query being used contains upper or
> lower case J and S (in you John Smith case)
>
> Apologizes on the 'content:foo'.
> I changed the code snippet to "somefield", and missed changing that part
> of the output
>
> David Shifflett
>
>
> On 10/22/19, 5:51 AM, "Mikhail Khludnev" <[hidden email]> wrote:
>
>     Hello,
>     I wonder how it come up with this particular field :
>     content:foo
>     Anyway I added some uppercase in the test and it passed despite of it
>
>     diff --git
>
> a/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
>
> b/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
>     index 5935da9..9baa492 100644
>     ---
>
> a/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
>     +++
>
> b/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
>     @@ -55,8 +55,8 @@
>        boolean inOrder = true;
>
>        public void testComplexPhrases() throws Exception {
>     -    checkMatches("\"john smith\"", "1"); // Simple multi-term still
> works
>     -    checkMatches("\"j*   smyth~\"", "1,2"); // wildcards and fuzzies
> are
>     OK in
>     +    checkMatches("\"John Smith\"", "1"); // Simple multi-term still
> works
>     +    checkMatches("\"J*   Smyth~\"", "1,2"); // wildcards and fuzzies
> are
>     OK in
>          // phrases
>          checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic
> works
>          checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic
> works.
>     @@ -161,11 +161,11 @@
>          checkMatches("name:\"j*   smyth~\"", "1,2");
>          checkMatches("role:\"developer\"", "1,2");
>          checkMatches("role:\"p* manager\"", "4");
>     -    checkMatches("role:de*", "1,2,3");
>     +    checkMatches("role:De*", "1,2,3");
>          checkMatches("name:\"j* smyth~\"~5", "1,2,3");
>          checkMatches("role:\"p* manager\" AND name:jack*", "4");
>          checkMatches("+role:developer +name:jack*", "");
>     -    checkMatches("name:\"john smith\"~2 AND role:designer AND id:3",
> "3");
>     +    checkMatches("name:\"john smith\"~2 AND role:Designer AND id:3",
> "3");
>        }
>
>        public void testToStringContainsSlop() throws Exception {
>
>     Problem seems a way odd (assuming CPQP does analysis), it seems like
>     debugging is the last resort in this particular case.
>
>     On Mon, Oct 21, 2019 at 8:31 PM Shifflett, David [USA] <
>     [hidden email]> wrote:
>
>     > Hi all,
>     > Using the code snippet:
>     >     ComplexPhraseQueryParser qp = new
>     > ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer());
>     >     String teststr = "\"Foo Bar\"~2";
>     >     Query queryToSearch = qp.parse(teststr);
>     >     System.out.println("Query : " + queryToSearch.toString());
>     >     System.out.println("Type of query : " +
>     > queryToSearch.getClass().getSimpleName());
>     >
>     > I am getting the output
>     >     Query : "Foo Bar"~2
>     >     Type of query : ComplexPhraseQuery
>     >
>     > If I change teststr to "\"Foo Bar\""
>     > I get
>     >     Query : "Foo Bar"
>     >     Type of query : ComplexPhraseQuery
>     >
>     > If I change teststr to "Foo Bar"
>     > I get
>     >     Query : content:foo content:bar
>     >     Type of query : BooleanQuery
>     >
>     >
>     > In the first two cases I was expecting the search terms to be
> switched to
>     > lowercase.
>     >
>     > Were the Foo and Bar left as originally specified because the terms
> are
>     > inside double quotes?
>     >
>     > How can I specify a search term that I want treated as a Phrase,
>     > but also have the query parser apply the LowerCaseFilter?
>     >
>     > I am hoping to avoid the need to handle this using PhraseQuery,
>     > and continue to use the QueryParser.
>     >
>     >
>     > Thanks in advance for any help you can give me,
>     > David Shifflett
>     >
>     >
>
>     --
>     Sincerely yours
>     Mikhail Khludnev
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

Shifflett, David [USA]
I saw the changes in the diff.
But without looking into the test, I am asking to confirm if it
matches my conditions:
1) Uses a StandardAnalyzer
2) Does the actual query.toString() return lowercase J and S

David Shifflett


On 10/22/19, 10:44 AM, "Mikhail Khludnev" <[hidden email]> wrote:

    On Tue, Oct 22, 2019 at 5:26 PM Shifflett, David [USA] <
    [hidden email]> wrote:
   
    > Mikhail,
    >
    > Thanks for running those tests.
    > I haven’t looked into the test, but can you confirm it uses an analyzer
    > with the lowercase filter?
    >
    Look at his diff. It's a diff on test not a test
   
    -    checkMatches("\"john smith\"", "1"); // Simple multi-term still works
    -    checkMatches("\"j*   smyth~\"", "1,2"); // wildcards and fuzzies are
    OK in
    +    checkMatches("\"John Smith\"", "1"); // Simple multi-term still works
    +    checkMatches("\"J*   Smyth~\"", "1,2"); // wildcards and fuzzies are
    OK in
   
    Here I flip to Capital letters, and it still matches what it matches before
    in lower.
   
   
    > Also can you confirm whether the actual query being used contains upper or
    > lower case J and S (in you John Smith case)
    >
    > Apologizes on the 'content:foo'.
    > I changed the code snippet to "somefield", and missed changing that part
    > of the output
    >
    > David Shifflett
    >
    >
    > On 10/22/19, 5:51 AM, "Mikhail Khludnev" <[hidden email]> wrote:
    >
    >     Hello,
    >     I wonder how it come up with this particular field :
    >     content:foo
    >     Anyway I added some uppercase in the test and it passed despite of it
    >
    >     diff --git
    >
    > a/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    >
    > b/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    >     index 5935da9..9baa492 100644
    >     ---
    >
    > a/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    >     +++
    >
    > b/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    >     @@ -55,8 +55,8 @@
    >        boolean inOrder = true;
    >
    >        public void testComplexPhrases() throws Exception {
    >     -    checkMatches("\"john smith\"", "1"); // Simple multi-term still
    > works
    >     -    checkMatches("\"j*   smyth~\"", "1,2"); // wildcards and fuzzies
    > are
    >     OK in
    >     +    checkMatches("\"John Smith\"", "1"); // Simple multi-term still
    > works
    >     +    checkMatches("\"J*   Smyth~\"", "1,2"); // wildcards and fuzzies
    > are
    >     OK in
    >          // phrases
    >          checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic
    > works
    >          checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic
    > works.
    >     @@ -161,11 +161,11 @@
    >          checkMatches("name:\"j*   smyth~\"", "1,2");
    >          checkMatches("role:\"developer\"", "1,2");
    >          checkMatches("role:\"p* manager\"", "4");
    >     -    checkMatches("role:de*", "1,2,3");
    >     +    checkMatches("role:De*", "1,2,3");
    >          checkMatches("name:\"j* smyth~\"~5", "1,2,3");
    >          checkMatches("role:\"p* manager\" AND name:jack*", "4");
    >          checkMatches("+role:developer +name:jack*", "");
    >     -    checkMatches("name:\"john smith\"~2 AND role:designer AND id:3",
    > "3");
    >     +    checkMatches("name:\"john smith\"~2 AND role:Designer AND id:3",
    > "3");
    >        }
    >
    >        public void testToStringContainsSlop() throws Exception {
    >
    >     Problem seems a way odd (assuming CPQP does analysis), it seems like
    >     debugging is the last resort in this particular case.
    >
    >     On Mon, Oct 21, 2019 at 8:31 PM Shifflett, David [USA] <
    >     [hidden email]> wrote:
    >
    >     > Hi all,
    >     > Using the code snippet:
    >     >     ComplexPhraseQueryParser qp = new
    >     > ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer());
    >     >     String teststr = "\"Foo Bar\"~2";
    >     >     Query queryToSearch = qp.parse(teststr);
    >     >     System.out.println("Query : " + queryToSearch.toString());
    >     >     System.out.println("Type of query : " +
    >     > queryToSearch.getClass().getSimpleName());
    >     >
    >     > I am getting the output
    >     >     Query : "Foo Bar"~2
    >     >     Type of query : ComplexPhraseQuery
    >     >
    >     > If I change teststr to "\"Foo Bar\""
    >     > I get
    >     >     Query : "Foo Bar"
    >     >     Type of query : ComplexPhraseQuery
    >     >
    >     > If I change teststr to "Foo Bar"
    >     > I get
    >     >     Query : content:foo content:bar
    >     >     Type of query : BooleanQuery
    >     >
    >     >
    >     > In the first two cases I was expecting the search terms to be
    > switched to
    >     > lowercase.
    >     >
    >     > Were the Foo and Bar left as originally specified because the terms
    > are
    >     > inside double quotes?
    >     >
    >     > How can I specify a search term that I want treated as a Phrase,
    >     > but also have the query parser apply the LowerCaseFilter?
    >     >
    >     > I am hoping to avoid the need to handle this using PhraseQuery,
    >     > and continue to use the QueryParser.
    >     >
    >     >
    >     > Thanks in advance for any help you can give me,
    >     > David Shifflett
    >     >
    >     >
    >
    >     --
    >     Sincerely yours
    >     Mikhail Khludnev
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: [hidden email]
    > For additional commands, e-mail: [hidden email]
    >
    >
   
    --
    Sincerely yours
    Mikhail Khludnev
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

Mikhail Khludnev-2
Removed attachment proves that ComplexPhraseQuery toString() is just misguiding.  

On Tue, Oct 22, 2019 at 5:51 PM Shifflett, David [USA] <[hidden email]> wrote:
I saw the changes in the diff.
But without looking into the test, I am asking to confirm if it
matches my conditions:
1) Uses a StandardAnalyzer
2) Does the actual query.toString() return lowercase J and S

David Shifflett


On 10/22/19, 10:44 AM, "Mikhail Khludnev" <[hidden email]> wrote:

    On Tue, Oct 22, 2019 at 5:26 PM Shifflett, David [USA] <
    [hidden email]> wrote:

    > Mikhail,
    >
    > Thanks for running those tests.
    > I haven’t looked into the test, but can you confirm it uses an analyzer
    > with the lowercase filter?
    >
    Look at his diff. It's a diff on test not a test

    -    checkMatches("\"john smith\"", "1"); // Simple multi-term still works
    -    checkMatches("\"j*   smyth~\"", "1,2"); // wildcards and fuzzies are
    OK in
    +    checkMatches("\"John Smith\"", "1"); // Simple multi-term still works
    +    checkMatches("\"J*   Smyth~\"", "1,2"); // wildcards and fuzzies are
    OK in

    Here I flip to Capital letters, and it still matches what it matches before
    in lower.


    > Also can you confirm whether the actual query being used contains upper or
    > lower case J and S (in you John Smith case)
    >
    > Apologizes on the 'content:foo'.
    > I changed the code snippet to "somefield", and missed changing that part
    > of the output
    >
    > David Shifflett
    >
    >
    > On 10/22/19, 5:51 AM, "Mikhail Khludnev" <[hidden email]> wrote:
    >
    >     Hello,
    >     I wonder how it come up with this particular field :
    >     content:foo
    >     Anyway I added some uppercase in the test and it passed despite of it
    >
    >     diff --git
    >
    > a/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    >
    > b/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    >     index 5935da9..9baa492 100644
    >     ---
    >
    > a/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    >     +++
    >
    > b/lucene/queryparser/src/test/org/apache/lucene/queryparser/complexPhrase/TestComplexPhraseQuery.java
    >     @@ -55,8 +55,8 @@
    >        boolean inOrder = true;
    >
    >        public void testComplexPhrases() throws Exception {
    >     -    checkMatches("\"john smith\"", "1"); // Simple multi-term still
    > works
    >     -    checkMatches("\"j*   smyth~\"", "1,2"); // wildcards and fuzzies
    > are
    >     OK in
    >     +    checkMatches("\"John Smith\"", "1"); // Simple multi-term still
    > works
    >     +    checkMatches("\"J*   Smyth~\"", "1,2"); // wildcards and fuzzies
    > are
    >     OK in
    >          // phrases
    >          checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic
    > works
    >          checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic
    > works.
    >     @@ -161,11 +161,11 @@
    >          checkMatches("name:\"j*   smyth~\"", "1,2");
    >          checkMatches("role:\"developer\"", "1,2");
    >          checkMatches("role:\"p* manager\"", "4");
    >     -    checkMatches("role:de*", "1,2,3");
    >     +    checkMatches("role:De*", "1,2,3");
    >          checkMatches("name:\"j* smyth~\"~5", "1,2,3");
    >          checkMatches("role:\"p* manager\" AND name:jack*", "4");
    >          checkMatches("+role:developer +name:jack*", "");
    >     -    checkMatches("name:\"john smith\"~2 AND role:designer AND id:3",
    > "3");
    >     +    checkMatches("name:\"john smith\"~2 AND role:Designer AND id:3",
    > "3");
    >        }
    >
    >        public void testToStringContainsSlop() throws Exception {
    >
    >     Problem seems a way odd (assuming CPQP does analysis), it seems like
    >     debugging is the last resort in this particular case.
    >
    >     On Mon, Oct 21, 2019 at 8:31 PM Shifflett, David [USA] <
    >     [hidden email]> wrote:
    >
    >     > Hi all,
    >     > Using the code snippet:
    >     >     ComplexPhraseQueryParser qp = new
    >     > ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer());
    >     >     String teststr = "\"Foo Bar\"~2";
    >     >     Query queryToSearch = qp.parse(teststr);
    >     >     System.out.println("Query : " + queryToSearch.toString());
    >     >     System.out.println("Type of query : " +
    >     > queryToSearch.getClass().getSimpleName());
    >     >
    >     > I am getting the output
    >     >     Query : "Foo Bar"~2
    >     >     Type of query : ComplexPhraseQuery
    >     >
    >     > If I change teststr to "\"Foo Bar\""
    >     > I get
    >     >     Query : "Foo Bar"
    >     >     Type of query : ComplexPhraseQuery
    >     >
    >     > If I change teststr to "Foo Bar"
    >     > I get
    >     >     Query : content:foo content:bar
    >     >     Type of query : BooleanQuery
    >     >
    >     >
    >     > In the first two cases I was expecting the search terms to be
    > switched to
    >     > lowercase.
    >     >
    >     > Were the Foo and Bar left as originally specified because the terms
    > are
    >     > inside double quotes?
    >     >
    >     > How can I specify a search term that I want treated as a Phrase,
    >     > but also have the query parser apply the LowerCaseFilter?
    >     >
    >     > I am hoping to avoid the need to handle this using PhraseQuery,
    >     > and continue to use the QueryParser.
    >     >
    >     >
    >     > Thanks in advance for any help you can give me,
    >     > David Shifflett
    >     >
    >     >
    >
    >     --
    >     Sincerely yours
    >     Mikhail Khludnev
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: [hidden email]
    > For additional commands, e-mail: [hidden email]
    >
    >

    --
    Sincerely yours
    Mikhail Khludnev



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



--
Sincerely yours
Mikhail Khludnev


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]