[jira] Created: (LUCENE-682) QueryParser with Locale Based Operators (French included)

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Michael Gibney (Jira)
     [ http://issues.apache.org/jira/browse/LUCENE-682?page=all ]

Hoss Man updated LUCENE-682:
----------------------------

    Attachment: LocalizedQueryParser.patch

Revised version of the patch -- includes the changes so that only one method creates the lists; a test of the splitting logic; more javadocs clarifing the interaction of the methods.

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>         Assigned To: Hoss Man
>            Priority: Minor
>         Attachments: LocalizedQueryParser.patch, LocalizedQueryParser.patch, LocalizedQueryParser.zip, LocalizedQueryParserDemo.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
     [ http://issues.apache.org/jira/browse/LUCENE-682?page=all ]

Hoss Man updated LUCENE-682:
----------------------------

    Attachment: LocalizedQueryParserOperatorsMicroBench.java

microbenchmark of the "default" (no ResourceBundle) usage, run against the current trunk, and with this change (to determine the performance costs of the added work in the Javacc generated code)

two tests, one of a new instance for each parse call, one of reusing the same instance for 5 parse calls; 3 runs each, 10000 iterations each, time in seconds...

                 1        1         1         5         5         5
    trunk:  1.897  1.904  1.913  7.415  7.447  7.446
w/patch:  2.01    2.005   2.01   7.851  7.888  7.886

...doesn't seem like a big enough factor to worry about (unless i missed something obvious when i wrote the test ... i was on a plane at the time and slightly distracted by the very chatty woman next to me)


> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>         Assigned To: Hoss Man
>            Priority: Minor
>         Attachments: LocalizedQueryParser.patch, LocalizedQueryParser.patch, LocalizedQueryParser.zip, LocalizedQueryParserDemo.java, LocalizedQueryParserOperatorsMicroBench.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
     [ http://issues.apache.org/jira/browse/LUCENE-682?page=all ]

Hoss Man updated LUCENE-682:
----------------------------

    Lucene Fields: [Patch Available]

issues has a modest number of notes, and seems to me like it would be very usefull as more language property files are contributed ... does any one have any objections to it being commited?

NOTE: there are some negative performance impacts to QueryParser as a result of this patch.

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>         Assigned To: Hoss Man
>            Priority: Minor
>         Attachments: LocalizedQueryParser.patch, LocalizedQueryParser.patch, LocalizedQueryParser.zip, LocalizedQueryParserDemo.java, LocalizedQueryParserOperatorsMicroBench.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-682?page=comments#action_12456258 ]
           
Yonik Seeley commented on LUCENE-682:
-------------------------------------

Frankly, I'm not excited about a 6% performance loss so that someone can customize
a total of 3 tokens that don't add additional expressive power or features.  AND, OR, and NOT, are short and easy to understand even for foreign-language speakers.  Consider that to construct raw Lucene queries themselves, they would need to know Lucene, and for that, they will most likely have a passing familiarity with English anyway.

I think this would be better implemented as a preprocessor, outside of the query parser.
I don't think that would be too hard, and then there would be no performance impact for the 99% of people who will stick with AND/OR/NOT (or +/-)

It might even be expressible as a regular expression.

Maybe it's just me though, so I wouldn't mind hearing some other opinions.

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>         Assigned To: Hoss Man
>            Priority: Minor
>         Attachments: LocalizedQueryParser.patch, LocalizedQueryParser.patch, LocalizedQueryParser.zip, LocalizedQueryParserDemo.java, LocalizedQueryParserOperatorsMicroBench.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-682?page=comments#action_12456294 ]
           
Yonik Seeley commented on LUCENE-682:
-------------------------------------



Something like this perhaps:

  public static String change(String s, String AND, String OR, String NOT) {
    int len = s.length();
    StringBuilder b = new StringBuilder();
    boolean newString=false;
    boolean changed=false;
    boolean inString=false;
    char prev='!';
    for (int i=0; i<s.length(); i++) {
     char ch = s.charAt(i);
     switch (ch) {
       case '\\' : b.append(s.charAt(++i)); break;
       case '\'' : inString=!inString; break;
       case 'A' :
         if (!inString
             && !Character.isJavaIdentifierPart(prev)
             && i+2 < s.length()
             && s.charAt(i+1) == 'N'
             && s.charAt(i+2) == 'D'
             && (i+3==s.length() || !Character.isJavaIdentifierPart(s.charAt(i+3))))
         {
           b.append(AND);
           changed=true;
           i+=2;
         }
         break;
       case 'O' :
         if (!inString
             && !Character.isJavaIdentifierPart(prev)
             && i+1 < s.length()
             && s.charAt(i+1) == 'R'
             && (i+2==s.length() || !Character.isJavaIdentifierPart(s.charAt(i+2))))
         {
           b.append(OR);
           changed=true;
           i+=1;
         }
         break;
       case 'N' :
         if (!inString
             && !Character.isJavaIdentifierPart(prev)
             && i+2 < s.length()
             && s.charAt(i+1) == 'O'
             && s.charAt(i+2) == 'T'
             && (i+3==s.length() || !Character.isJavaIdentifierPart(s.charAt(i+3))))
         {
           b.append(NOT);
           changed=true;
           i+=2;          
         }
         break;
       default: break;
     }
     if (changed) {
       newString=true;
       changed=false;
     } else {
       b.append(ch);
       prev = ch;
     }
    }
    return newString ? b.toString() : s;
  }

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>         Assigned To: Hoss Man
>            Priority: Minor
>         Attachments: LocalizedQueryParser.patch, LocalizedQueryParser.patch, LocalizedQueryParser.zip, LocalizedQueryParserDemo.java, LocalizedQueryParserOperatorsMicroBench.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-682?page=comments#action_12456295 ]
           
Yonik Seeley commented on LUCENE-682:
-------------------------------------

That's untested code of course... I just noticed that
       case '\\' : b.append(s.charAt(++i)); break;
       case '\'' : inString=!inString; break;
should probably be
        case '"': if (++i<len) {b.append(ch); ch=s.charAt(i);} break;
       case '\'' : inString=!inString; break;

It can probably be made faster too... but the point is that it's independent of the query parser and more easily customized.

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>         Assigned To: Hoss Man
>            Priority: Minor
>         Attachments: LocalizedQueryParser.patch, LocalizedQueryParser.patch, LocalizedQueryParser.zip, LocalizedQueryParserDemo.java, LocalizedQueryParserOperatorsMicroBench.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12