[jira] Created: (LUCENE-682) QueryParser with Locale Based Operators (French included)

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-682) QueryParser with Locale Based Operators (French included)

JIRA jira@apache.org
QueryParser with Locale Based Operators (French included)
---------------------------------------------------------

                 Key: LUCENE-682
                 URL: http://issues.apache.org/jira/browse/LUCENE-682
             Project: Lucene - Java
          Issue Type: New Feature
          Components: QueryParser
            Reporter: Patrick Turcotte
            Priority: Minor
         Attachments: QueryParser.jj.patch, QueryParser.properties

Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.

If activated,
- "a ET b" should return the same query as "a AND b", namely: "+a +b"
- "a OU b" should return the same query as "a OR b", namely: "a b"
- "a SAUF b" should return the same query as "a NOT b", namely: "a -b"

Here are its main points :

1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
2) The "ant test" target is still successful when the modified QueryParser is used
3) It doesn't break actual code
4) The default behavior is the same as before
5) It has to be deliberately activated
6) It use ResourceBundle to find the keywords translation
7) Comes with FRENCH translation
8) Comes with JUnit testCases
9) Adds 1 public method to QueryParser
10) Expands the TOKEN <TERM>
11) Use TOKEN_MGR_DECLS to set some field for the TokenManager


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-682) QueryParser with Locale Based Operators (French included)

JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-682?page=all ]

Patrick Turcotte updated LUCENE-682:
------------------------------------

    Attachment: QueryParser.jj.patch

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>            Priority: Minor
>         Attachments: QueryParser.jj.patch, QueryParser.properties
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-682) QueryParser with Locale Based Operators (French included)

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-682?page=all ]

Patrick Turcotte updated LUCENE-682:
------------------------------------

    Attachment: QueryParser.properties

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>            Priority: Minor
>         Attachments: QueryParser.jj.patch, QueryParser.properties
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-682) QueryParser with Locale Based Operators (French included)

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-682?page=all ]

Patrick Turcotte updated LUCENE-682:
------------------------------------

    Attachment: QueryParser_fr.properties
                TestQueryParserLocaleOperators.java
                LocalizedQueryParserDemo.java

Sorry, didn't see the attach multiple files before.

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>            Priority: Minor
>         Attachments: LocalizedQueryParserDemo.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-682) QueryParser with Locale Based Operators (French included)

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-682?page=all ]

Patrick Turcotte updated LUCENE-682:
------------------------------------

    Attachment: QueryParser.jj

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>            Priority: Minor
>         Attachments: LocalizedQueryParserDemo.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-682) QueryParser with Locale Based Operators (French included)

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/LUCENE-682?page=comments#action_12444776 ]
           
Otis Gospodnetic commented on LUCENE-682:
-----------------------------------------

I like this and have a question.  The createLocalizedTokenMap() method is called from that new setter method.
Since QueryParser is not thread safe, one has to instantiate a new QP, set the Locale and call that setter before each parse(....) call.  Unless ResourceBundle does some internal caching, doesn't this mean each parsed query will execute that createLocalizedTokenMap() method?  Since the resource files are not likely to change, shouldn't we cache things?



> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>            Priority: Minor
>         Attachments: LocalizedQueryParserDemo.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Patrek
From what I read in the Javadoc of ResouceBundle, it does cache the values.

Patrick Turcotte

On 10/26/06, Otis Gospodnetic (JIRA) <[hidden email]> wrote:

>
>     [
> http://issues.apache.org/jira/browse/LUCENE-682?page=comments#action_12444776]
>
> Otis Gospodnetic commented on LUCENE-682:
> -----------------------------------------
>
> I like this and have a question.  The createLocalizedTokenMap() method is
> called from that new setter method.
> Since QueryParser is not thread safe, one has to instantiate a new QP, set
> the Locale and call that setter before each parse(....) call.  Unless
> ResourceBundle does some internal caching, doesn't this mean each parsed
> query will execute that createLocalizedTokenMap() method?  Since the
> resource files are not likely to change, shouldn't we cache things?
>
>
>
> > QueryParser with Locale Based Operators (French included)
> > ---------------------------------------------------------
> >
> >                 Key: LUCENE-682
> >                 URL: http://issues.apache.org/jira/browse/LUCENE-682
> >             Project: Lucene - Java
> >          Issue Type: New Feature
> >          Components: QueryParser
> >            Reporter: Patrick Turcotte
> >            Priority: Minor
> >         Attachments: LocalizedQueryParserDemo.java, QueryParser.jj,
> QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties,
> TestQueryParserLocaleOperators.java
> >
> >
> > Here is a version of the QueryParser that can "understand" the AND, OR
> and NOT keyword in other languages.
> > If activated,
> > - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> > - "a OU b" should return the same query as "a OR b", namely: "a b"
> > - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> > Here are its main points :
> > 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could
> be used with other versions)
> > 2) The "ant test" target is still successful when the modified
> QueryParser is used
> > 3) It doesn't break actual code
> > 4) The default behavior is the same as before
> > 5) It has to be deliberately activated
> > 6) It use ResourceBundle to find the keywords translation
> > 7) Comes with FRENCH translation
> > 8) Comes with JUnit testCases
> > 9) Adds 1 public method to QueryParser
> > 10) Expands the TOKEN <TERM>
> > 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
> http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Chris Hostetter-3
In reply to this post by JIRA jira@apache.org

: Here is a version of the QueryParser that can "understand" the AND, OR
: and NOT keyword in other languages.

I finally got a chance to skim this ... i don't know a lot about javaCC or
ResourceBundles, but this looks really cool to me.  Mainly because it
looks like this would make it really easy to completley disable the use of
"AND" "OR" and "NOT" as well -- so they can be treated as regular terms
(at the moment there is no way to "escape" them) by directly modify
andCases, orCases, and notCases (in a subclass, or through new methods)

        ...except...

...I don't see anything in the patch that would eliminate the use of "AND"
to mean *AND* if a ResourceBundle is used ... are the English operators
still in effect when the ResourceBundle operators are specified?

Also: i noticed you used ResourceBundle.getString for each of the keys,
and then put it in a single element ArrayList .. why not use
ResourceBundle.getStringArray so that the bundle can define multiple words
to foreach operator?  (note: we'd probably want to put them in a Set at
that point)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Patrek
>
> : Here is a version of the QueryParser that can "understand" the AND, OR
> : and NOT keyword in other languages.
>
> I finally got a chance to skim this ... i don't know a lot about javaCC or
> ResourceBundles, but this looks really cool to me.  Mainly because it
> looks like this would make it really easy to completley disable the use of
> "AND" "OR" and "NOT" as well -- so they can be treated as regular terms
> (at the moment there is no way to "escape" them) by directly modify
> andCases, orCases, and notCases (in a subclass, or through new methods)
>
>         ...except...
>
> ...I don't see anything in the patch that would eliminate the use of "AND"
> to mean *AND* if a ResourceBundle is used ... are the English operators
> still in effect when the ResourceBundle operators are specified?


That is on purpose, for now. The idea was to introduce a new functionnality
without changing the way things worked before. But yes, it would be possible
to make it so that, if wanted, the English operators could be "disabled".

Also: i noticed you used ResourceBundle.getString for each of the keys,
> and then put it in a single element ArrayList .. why not use
> ResourceBundle.getStringArray so that the bundle can define multiple words
> to foreach operator?  (note: we'd probably want to put them in a Set at
> that point)
>
> Interesting idea.

Give me a few days, I'll take the time to submit a new version of the patch
with the suggested enhancements.

Patrick Turcotte
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-682) QueryParser with Locale Based Operators (French included)

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/LUCENE-682?page=comments#action_12445062 ]
           
Patrick Turcotte commented on LUCENE-682:
-----------------------------------------

From what I read in the Javadoc of ResouceBundle, it does cache the values.

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>            Priority: Minor
>         Attachments: LocalizedQueryParserDemo.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Chris Hostetter-3
In reply to this post by Patrek

: That is on purpose, for now. The idea was to introduce a new functionnality
: without changing the way things worked before. But yes, it would be possible
: to make it so that, if wanted, the English operators could be "disabled".

Hmmm... i think i see what you mean, eliminating the english operators
from the grammer would currently require that an English bundle be used
... and you wanted the Locale/bundle specifications to be entirely
optional right?

as i said, i'm not very familiar with JavaCC. ... but couldn't the action
for AND, NOT, and OR gain a similar code block like the one you added for
TERM -- but would do the opposite based on useLocalizedOperators ? ... i'm
guessing it owuld be something like...

  <AND:       ("AND" | "&&") >
{
        if (useLocalizedOperators){
           matchedToken.kind = TERM;
}

...that way if localized operators are turnd on, AND will be treated like
a regular term, but if they aren't the grammer is still the same as it
allways was.  Right?

: Give me a few days, I'll take the time to submit a new version of the patch
: with the suggested enhancements.

That would be really great.  One of these days i need to dust of my
Flex/Yak/Bison books and remind myself how parsers and grammers work so i
can help out more on stuff like this.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Patrek
Hi,

I'm making way towards the new version.

I'll keep the &&, || and ! operators always active whatever the language.

As for the bundle.getStringArray(key), it doesn't work for properties files.
A workaround to allow multiple words for each operator would be to put many
on the same line in the property file, as:

AND=ET;AUSSI;AVEC

I'll make it using a semi column (;) as a separator in the property file,
shouldn't be used too often. Please tell me if I should use something else.

Patrick
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Chris Hostetter-3

: I'll keep the &&, || and ! operators always active whatever the language.

that's probably wise

: As for the bundle.getStringArray(key), it doesn't work for properties files.
: A workaround to allow multiple words for each operator would be to put many
: on the same line in the property file, as:

: I'll make it using a semi column (;) as a separator in the property file,
: shouldn't be used too often. Please tell me if I should use something else.

As i said, i don't really have a lot of experience with ResourceBundles,
but from what i can tell, i think you should just use getStringArray and
trust whatever list comes back (you get a 0 or 1 item list when using a
Properties file right?)

the level of abstraction you have is with the ResourceBundle API -- not
the Properties API, so you shouldn't assume that your users are providing
a PropertiesResourfceBundle in which you can/need-to split the strings you
get -- trust that the getStringAttary method does the right thing, most
people wll use properties files that return single values, and if anyone
wants multiple values they can write their own class implimenting
ResourceBundle which returns multiplevalues.


if you *really* want to support multiple values in Properties fields ...
have a key which if set denotes the string value that all other values
should be split on, if not set, then no splitting is done (that way you
don't have to worry that you might pick a "split character" which winds up
being important to someone.

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Patrek
> : I'll make it using a semi column (;) as a separator in the property
> file,
> : shouldn't be used too often. Please tell me if I should use something
> else.
>
> As i said, i don't really have a lot of experience with ResourceBundles,
> but from what i can tell, i think you should just use getStringArray and
> trust whatever list comes back (you get a 0 or 1 item list when using a
> Properties file right?)


Unfortunately, no. If  Iuse getStringArray while using a
PropertyResourceBundle to backup the ResourceBundle, getStringArray(key)
give a classCastException, as it is just a Wrapper for : (String[])
getObject(key).

Worse, I tried implementing a ListResourceBundle and I also get a
classCastException on getStringArray()

the level of abstraction you have is with the ResourceBundle API -- not
> the Properties API, so you shouldn't assume that your users are providing
> a PropertiesResourfceBundle in which you can/need-to split the strings you
> get -- trust that the getStringAttary method does the right thing, most
> people wll use properties files that return single values, and if anyone
> wants multiple values they can write their own class implimenting
> ResourceBundle which returns multiplevalues.


Knowing what I said previously, should I still call getStringArray, catch
the exception and there use getString. Seems costly to me, as most will
probably a PropertiesResourceBundle to store values. What do you think?

if you *really* want to support multiple values in Properties fields ...
> have a key which if set denotes the string value that all other values
> should be split on, if not set, then no splitting is done (that way you
> don't have to worry that you might pick a "split character" which winds up
> being important to someone.


That's right, even if there aren't many character left that don't have a
meaning.

Patrick
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Chris Hostetter-3

: Unfortunately, no. If  Iuse getStringArray while using a
: PropertyResourceBundle to backup the ResourceBundle, getStringArray(key)
: give a classCastException, as it is just a Wrapper for : (String[])
: getObject(key).

Hmmm... so aparently ResourceBundles suck.  good to know.

: Knowing what I said previously, should I still call getStringArray, catch
: the exception and there use getString. Seems costly to me, as most will
: probably a PropertiesResourceBundle to store values. What do you think?

i retract all of my previous suggestions about getStringArray ... go with
your gut, do the simplest thing that works.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-682) QueryParser with Locale Based Operators (French included)

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-682?page=all ]

Patrick Turcotte updated LUCENE-682:
------------------------------------

    Attachment: LocalizedQueryParser.zip

New versions. All in the zip file. Improvements are:

- By default, if used, using a localized version disable English tokens (AND, NOT, OR)
- More than one operator may be define in the bundle (separated by ';')
- &&, || and ! operators are always active.

A note on ResourceBundle, as I had to do some test to understand the documentation:

1) getStringArray() is really just a wrapper on (String[]) getObject() and throws exception on PropertiesResourceBundle and ListResourceBundle.

2) The order to get a bundle is a little tricky. Javadoc should be read :

    * baseName + "_" + selectedLanguage + "_" + country1 + "_" + variant1
    * baseName + "_" + selectedLanguage + "_" + country1
    * baseName + "_" + selectedLanguage
    * baseName + "_" + defaultLocaleLanguage + "_" + country2 + "_" + variant2
    * baseName + "_" + defaultLocaleLanguage + "_" + country2
    * baseName + "_" + defaultLocaleLanguage
    * baseName
Which means that if a bundle exists by that baseName for your defaultLocale (Locale.getDefault()), you'll get values from it instead of from the baseName bundle.

Thanks to Chris Hostetter and Otis Gospodnetic for your comments and questions.

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>            Priority: Minor
>         Attachments: LocalizedQueryParser.zip, LocalizedQueryParserDemo.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-682) QueryParser with Locale Based Operators (French included)

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-682?page=all ]

Hoss Man updated LUCENE-682:
----------------------------

    Attachment: LocalizedQueryParser.patch

LocalizedQueryParser.patch contains everything in the most recent LocalizedQueryParser.zip, except in patch form, with a few minor changes...
  1) moved files into tree as appropraite
  2) reformated (\t to 2 spaces, license header in test)
  3) included javacc generated QueryParser*.java since they
       need to be commited too.
  4) added <copy> directive to build.xml so property files
      would make it into jar (and classpath for tests)

The code looks *great* and the unit tests provide nice coverage.

Like Otis, I'm also a little worried about the createLocalizedTokenMap() calls ... the ResourceBundle lookups *may* be cached, but there's also the splitting, and the fact that createLocalizedTokenMap() is called in both setLocale and setUseLocalizedOperators ... setLocale might be used just for the date parsing, so that could result in wasted cycles for some people ... not to mention setBundleBaseName *doesn't* call createLocalizedTokenMap so people who do...

   QueryParser qp = new QueryParser
   qp.setLocale(...)
   qp.setUseLocalizedOperators(true)
   qp.setBundleBaseName(...)

...will wind up triggering createLocalizedTokenMap twice, and still not use the Bundle the wanted to.

Even if we don't want to worry about caching the post-split arrays per Bundle, I think it might make more sense if setUseLocalizedOperators was the only method that called createLocalizedTokenMap, and it was documented that if should be called *after* setLocale and setBundleBaseName.

   ...any other opinions on this?

Other things that should probably be done before commiting:

  a) have at least one test using a Bundle that excercises
      the splitting.
  b) add some javadoc verbage to QueryParser.setLocale
      clarifying how it affects the operators.


...I may be able to find some time to play with this some more later this week and make those changes myself.


> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>            Priority: Minor
>         Attachments: LocalizedQueryParser.patch, LocalizedQueryParser.zip, LocalizedQueryParserDemo.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (LUCENE-682) QueryParser with Locale Based Operators (French included)

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-682?page=all ]

Hoss Man reassigned LUCENE-682:
-------------------------------

    Assignee: Hoss Man

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>         Assigned To: Hoss Man
>            Priority: Minor
>         Attachments: LocalizedQueryParser.patch, LocalizedQueryParser.zip, LocalizedQueryParserDemo.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-682) QueryParser with Locale Based Operators (French included)

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/LUCENE-682?page=comments#action_12451351 ]
           
Yonik Seeley commented on LUCENE-682:
-------------------------------------

Does anyone know what is the likely performance impact is when *not* using this feature? It's not easy for me to tell at a glance.

> QueryParser with Locale Based Operators (French included)
> ---------------------------------------------------------
>
>                 Key: LUCENE-682
>                 URL: http://issues.apache.org/jira/browse/LUCENE-682
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: QueryParser
>            Reporter: Patrick Turcotte
>         Assigned To: Hoss Man
>            Priority: Minor
>         Attachments: LocalizedQueryParser.patch, LocalizedQueryParser.zip, LocalizedQueryParserDemo.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
>
>
> Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
> If activated,
> - "a ET b" should return the same query as "a AND b", namely: "+a +b"
> - "a OU b" should return the same query as "a OR b", namely: "a b"
> - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
> Here are its main points :
> 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
> 2) The "ant test" target is still successful when the modified QueryParser is used
> 3) It doesn't break actual code
> 4) The default behavior is the same as before
> 5) It has to be deliberately activated
> 6) It use ResourceBundle to find the keywords translation
> 7) Comes with FRENCH translation
> 8) Comes with JUnit testCases
> 9) Adds 1 public method to QueryParser
> 10) Expands the TOKEN <TERM>
> 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-682) QueryParser with Locale Based Operators (French included)

Chris Hostetter-3

: Does anyone know what is the likely performance impact is when *not*
: using this feature? It's not easy for me to tell at a glance.

Assuming we get rid of the createLocalizedTokenMap call in setLocale, it
should be relatively minor: one extra boolean test of
useLocalizedOperators on each TERM parsed. ... except ... hmmmm....

There is something else going on.  As a result of the patch, JavaCC seems
to have decided that the generated source for QueryParserTokenManager
needs to maintain a "StringBuffer image" that gets continuously appended
to, but never seems to be used by anything.

Manually removing the autogenerated declaration and refrences to that
StringBuffer doesn't seem to adversely affect anything ... but i have no
idea why JavaCC decided to put it there.

Any JavaCC guru's out there understand what it's doing?




:
: > QueryParser with Locale Based Operators (French included)
: > ---------------------------------------------------------
: >
: >                 Key: LUCENE-682
: >                 URL: http://issues.apache.org/jira/browse/LUCENE-682
: >             Project: Lucene - Java
: >          Issue Type: New Feature
: >          Components: QueryParser
: >            Reporter: Patrick Turcotte
: >         Assigned To: Hoss Man
: >            Priority: Minor
: >         Attachments: LocalizedQueryParser.patch, LocalizedQueryParser.zip, LocalizedQueryParserDemo.java, QueryParser.jj, QueryParser.jj.patch, QueryParser.properties, QueryParser_fr.properties, TestQueryParserLocaleOperators.java
: >
: >
: > Here is a version of the QueryParser that can "understand" the AND, OR and NOT keyword in other languages.
: > If activated,
: > - "a ET b" should return the same query as "a AND b", namely: "+a +b"
: > - "a OU b" should return the same query as "a OR b", namely: "a b"
: > - "a SAUF b" should return the same query as "a NOT b", namely: "a -b"
: > Here are its main points :
: > 1) Patched from revision 454774 of lucene 2.1dev (trunk) (probably could be used with other versions)
: > 2) The "ant test" target is still successful when the modified QueryParser is used
: > 3) It doesn't break actual code
: > 4) The default behavior is the same as before
: > 5) It has to be deliberately activated
: > 6) It use ResourceBundle to find the keywords translation
: > 7) Comes with FRENCH translation
: > 8) Comes with JUnit testCases
: > 9) Adds 1 public method to QueryParser
: > 10) Expands the TOKEN <TERM>
: > 11) Use TOKEN_MGR_DECLS to set some field for the TokenManager
:
: --
: This message is automatically generated by JIRA.
: -
: If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
: -
: For more information on JIRA, see: http://www.atlassian.com/software/jira
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [hidden email]
: For additional commands, e-mail: [hidden email]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12