contrib/queryParsers/surround

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

contrib/queryParsers/surround

Paul Elschot
Dear readers,

I've started moving the surround query language
http://issues.apache.org/bugzilla/show_bug.cgi?id=34331
into the directory named by the title in my working copy of the lucene
trunk. When the tests pass I'll repost it there.
In case someone  needs this earlier, please holler.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/queryParsers/surround

Erik Hatcher

On May 28, 2005, at 10:04 AM, Paul Elschot wrote:
> Dear readers,
>
> I've started moving the surround query language
> http://issues.apache.org/bugzilla/show_bug.cgi?id=34331
> into the directory named by the title in my working copy of the lucene
> trunk. When the tests pass I'll repost it there.
> In case someone  needs this earlier, please holler.

As for naming conventions and where this should live in contrib,  
consider that a user will only want a single query parser and more  
than that would be unneeded bloat in her application.  The contrib  
pieces are all packaged as a separate JAR per directory under contrib.

My recommendation would be to put your wonderful surround parser and  
supporting infrastructure under contrib/surround.

I'm very much looking forward to having this available!

     Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/queryParsers/surround

Paul Elschot
On Saturday 28 May 2005 17:06, Erik Hatcher wrote:

>
> On May 28, 2005, at 10:04 AM, Paul Elschot wrote:
> > Dear readers,
> >
> > I've started moving the surround query language
> > http://issues.apache.org/bugzilla/show_bug.cgi?id=34331
> > into the directory named by the title in my working copy of the lucene
> > trunk. When the tests pass I'll repost it there.
> > In case someone  needs this earlier, please holler.
>
> As for naming conventions and where this should live in contrib,  
> consider that a user will only want a single query parser and more  
> than that would be unneeded bloat in her application.  The contrib  
> pieces are all packaged as a separate JAR per directory under contrib.
>
> My recommendation would be to put your wonderful surround parser and  
> supporting infrastructure under contrib/surround.
>
> I'm very much looking forward to having this available!

Meanwhile the tests pass again with some expected standard ouput.

A little bit of deprecation is left in the CharStream (getLine and
getColumn) in the parser. Would you have any idea how to deal with that?

I'll leave the build.xml stand alone with constants for the environment.
It was derived from a lucene build.xml of a few eons ago, so
I hope someone can still integrate it...

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/queryParsers/surround

Wolfgang Hoschek
Cool stuff. Once this has stabilized and settled down I might start  
exposing the surround language from XQuery/XPath as an experimental  
match facility.

Wolfgang.

On May 28, 2005, at 10:07 AM, Paul Elschot wrote:




> On Saturday 28 May 2005 17:06, Erik Hatcher wrote:
>
>
>
>
>>
>> On May 28, 2005, at 10:04 AM, Paul Elschot wrote:
>>
>>
>>
>>
>>> Dear readers,
>>>
>>> I've started moving the surround query language
>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=34331
>>> into the directory named by the title in my working copy of the  
>>> lucene
>>> trunk. When the tests pass I'll repost it there.
>>> In case someone  needs this earlier, please holler.
>>>
>>>
>>>
>>>
>>
>> As for naming conventions and where this should live in contrib,
>> consider that a user will only want a single query parser and more
>> than that would be unneeded bloat in her application.  The contrib
>> pieces are all packaged as a separate JAR per directory under  
>> contrib.
>>
>> My recommendation would be to put your wonderful surround parser and
>> supporting infrastructure under contrib/surround.
>>
>> I'm very much looking forward to having this available!
>>
>>
>>
>>
>
> Meanwhile the tests pass again with some expected standard ouput.
>
> A little bit of deprecation is left in the CharStream (getLine and
> getColumn) in the parser. Would you have any idea how to deal with  
> that?
>
> I'll leave the build.xml stand alone with constants for the  
> environment.
> It was derived from a lucene build.xml of a few eons ago, so
> I hope someone can still integrate it...
>
> Regards,
> Paul Elschot
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/queryParsers/surround

Erik Hatcher
In reply to this post by Paul Elschot

On May 28, 2005, at 1:07 PM, Paul Elschot wrote:
> A little bit of deprecation is left in the CharStream (getLine and
> getColumn) in the parser. Would you have any idea how to deal with  
> that?

This is due to Java 1.5, right?  I'm seeing the same thing in my  
project but haven't looked into it yet.

> I'll leave the build.xml stand alone with constants for the  
> environment.
> It was derived from a lucene build.xml of a few eons ago, so
> I hope someone can still integrate it...

I will be happy to integrate it.  Let me know when it's ready for  
contrib committing and consider it done.

Sure you can't become a committer, even just on a contrib/surround  
tree?  :)

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/queryParsers/surround

Paul Elschot
On Saturday 28 May 2005 21:26, Erik Hatcher wrote:
>
> On May 28, 2005, at 1:07 PM, Paul Elschot wrote:
> > A little bit of deprecation is left in the CharStream (getLine and
> > getColumn) in the parser. Would you have any idea how to deal with  
> > that?
>
> This is due to Java 1.5, right?  I'm seeing the same thing in my  
> project but haven't looked into it yet.

I used source="1.4" and target="1.4" for javac in build.xml,
so I don't think so.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/surround

Paul Elschot
In reply to this post by Erik Hatcher
On Saturday 28 May 2005 21:26, Erik Hatcher wrote:
>
> On May 28, 2005, at 1:07 PM, Paul Elschot wrote:
...
> > I'll leave the build.xml stand alone with constants for the  
> > environment.
> > It was derived from a lucene build.xml of a few eons ago, so
> > I hope someone can still integrate it...
>
> I will be happy to integrate it.  Let me know when it's ready for  
> contrib committing and consider it done.

There are still some intended warning messages on stdout
during the tests. This might be improved by extending the
tests to expect the messages on another invisible stream,
but I have not intention to make that change that myself:

http://issues.apache.org/bugzilla/show_bug.cgi?id=34331

> Sure you can't become a committer, even just on a contrib/surround  
> tree?  :)

I thought this list was for technical subjects?

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/queryParsers/surround

Otis Gospodnetic-2
In reply to this post by Paul Elschot
--- Erik Hatcher <[hidden email]> wrote:

>
> On May 28, 2005, at 10:04 AM, Paul Elschot wrote:
> > Dear readers,
> >
> > I've started moving the surround query language
> > http://issues.apache.org/bugzilla/show_bug.cgi?id=34331
> > into the directory named by the title in my working copy of the
> lucene
> > trunk. When the tests pass I'll repost it there.
> > In case someone  needs this earlier, please holler.
>
> As for naming conventions and where this should live in contrib,  
> consider that a user will only want a single query parser and more  
> than that would be unneeded bloat in her application.  The contrib  
> pieces are all packaged as a separate JAR per directory under
> contrib.

I agree, a typical user would want just one query parser to handle all
types of queries.  So wouldn't it be better to fold Paul's Surround
query parser into Lucene's current query parser?

Otherwise, the user will have to use tricks to detect that the entered
query uses Surround query syntax and use Surround query parser instead,
no?

Otis


> My recommendation would be to put your wonderful surround parser and
>
> supporting infrastructure under contrib/surround.
>
> I'm very much looking forward to having this available!


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/queryParsers/surround

Wolfgang Hoschek
Folding the surround syntax into the standard query parser would be  
great indeed!
I'd very much encourage the increased power and expressiveness lucene  
would gain through that.

Wolfgang.

On May 29, 2005, at 9:33 AM, Otis Gospodnetic wrote:


> --- Erik Hatcher <[hidden email]> wrote:
>
>
>>
>> On May 28, 2005, at 10:04 AM, Paul Elschot wrote:
>>
>>
>>> Dear readers,
>>>
>>> I've started moving the surround query language
>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=34331
>>> into the directory named by the title in my working copy of the
>>>
>>>
>> lucene
>>
>>
>>> trunk. When the tests pass I'll repost it there.
>>> In case someone  needs this earlier, please holler.
>>>
>>>
>>
>> As for naming conventions and where this should live in contrib,
>> consider that a user will only want a single query parser and more
>> than that would be unneeded bloat in her application.  The contrib
>> pieces are all packaged as a separate JAR per directory under
>> contrib.
>>
>>
>
> I agree, a typical user would want just one query parser to handle all
> types of queries.  So wouldn't it be better to fold Paul's Surround
> query parser into Lucene's current query parser?
>
> Otherwise, the user will have to use tricks to detect that the entered
> query uses Surround query syntax and use Surround query parser  
> instead,
> no?
>
> Otis
>
>
>
>
>> My recommendation would be to put your wonderful surround parser and
>>
>> supporting infrastructure under contrib/surround.
>>
>> I'm very much looking forward to having this available!
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/queryParsers/surround

Daniel Naber
In reply to this post by Otis Gospodnetic-2
On Sunday 29 May 2005 18:33, Otis Gospodnetic wrote:

>  So wouldn't it be better to fold Paul's Surround
> query parser into Lucene's current query parser?

I think QueryParser is already quite complicated, both from the developer
(javacc) and user perspective (AND vs. + etc) so I'd prefer to keep new
features separated.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/queryParsers/surround

Paul Elschot
In reply to this post by Wolfgang Hoschek
On Sunday 29 May 2005 20:03, Wolfgang Hoschek wrote:

> Folding the surround syntax into the standard query parser would be  
> great indeed!
> I'd very much encourage the increased power and expressiveness lucene  
> would gain through that.
>
> Wolfgang.
>
> On May 29, 2005, at 9:33 AM, Otis Gospodnetic wrote:
>
>
> > --- Erik Hatcher <[hidden email]> wrote:
> >
> >
> >>
> >> On May 28, 2005, at 10:04 AM, Paul Elschot wrote:
> >>
> >>
> >>> Dear readers,
> >>>
> >>> I've started moving the surround query language
> >>> http://issues.apache.org/bugzilla/show_bug.cgi?id=34331
> >>> into the directory named by the title in my working copy of the
> >>>
> >>>
> >> lucene
> >>
> >>
> >>> trunk. When the tests pass I'll repost it there.
> >>> In case someone  needs this earlier, please holler.
> >>>
> >>>
> >>
> >> As for naming conventions and where this should live in contrib,
> >> consider that a user will only want a single query parser and more
> >> than that would be unneeded bloat in her application.  The contrib
> >> pieces are all packaged as a separate JAR per directory under
> >> contrib.

The parser package as contributed is in the class:
org.apache.lucene.queryParser.surround.parser.QueryParser
and there is also a ...surround.query package.
Please feel free to improve this, if needed.

> >>
> >>
> >
> > I agree, a typical user would want just one query parser to handle all
> > types of queries.  So wouldn't it be better to fold Paul's Surround
> > query parser into Lucene's current query parser?
> >
> > Otherwise, the user will have to use tricks to detect that the entered
> > query uses Surround query syntax and use Surround query parser  
> > instead,
> > no?

Currently, yes. Some queries are legal in both syntaxes, so that would
not be any easy thing to detect beforehand. Since the surround syntax
is quite strict, one way might be to catch a parse exception from the
surround parser, and in that case use the default parser.
It's a hack, but it might work well in practice for users that explicitly want
to use them both at the same time.
Removing the lowercase operators from  the surround syntax would
probably reduce the confusion to a tolerable level in this case.

At the moment, the syntax is not cast in stone because afaik only
the test code depends on it.

> > Otis
> >
> >
> >
> >
> >> My recommendation would be to put your wonderful surround parser and
> >>
> >> supporting infrastructure under contrib/surround.

The only dependency to the lucene environment is currently in
the ../../ path name used to locate the lucene core jar.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/queryParsers/surround

Erik Hatcher
In reply to this post by Daniel Naber
I concur with Daniel on this.  For the moment, my preference is to  
bring in Paul's parser into contrib/surround and let it gain some  
additional exposure there.  I don't believe its possible or even  
preferable to attempt to build one query parser to rule them all.  
While a decent general purpose one is handy, I'm finding that my  
projects really demand more custom parsing capabilities than the  
built-in QueryParser can handle and that the quirks of the current  
parser cause some frustrations sometimes.

Perhaps over time, the built-in QueryParser can adopt some additional  
capabilities such as supporting the SpanQuery family but let's take  
that sort of thing slowly.

     Erik


On May 29, 2005, at 2:42 PM, Daniel Naber wrote:

> On Sunday 29 May 2005 18:33, Otis Gospodnetic wrote:
>
>
>>  So wouldn't it be better to fold Paul's Surround
>> query parser into Lucene's current query parser?
>>
>
> I think QueryParser is already quite complicated, both from the  
> developer
> (javacc) and user perspective (AND vs. + etc) so I'd prefer to keep  
> new
> features separated.
>
> Regards
>  Daniel
>
> --
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/surround

Paul Elschot
On Monday 30 May 2005 02:44, Erik Hatcher wrote:

> I concur with Daniel on this.  For the moment, my preference is to  
> bring in Paul's parser into contrib/surround and let it gain some  
> additional exposure there.  I don't believe its possible or even  
> preferable to attempt to build one query parser to rule them all.  
> While a decent general purpose one is handy, I'm finding that my  
> projects really demand more custom parsing capabilities than the  
> built-in QueryParser can handle and that the quirks of the current  
> parser cause some frustrations sometimes.
>
> Perhaps over time, the built-in QueryParser can adopt some additional  
> capabilities such as supporting the SpanQuery family but let's take  
> that sort of thing slowly.
>

How about extending the surround parser to allow the use of all
queries currently in Lucene? The goal would be to allow as many
queries as possible.

The queries not available in the current surround parser are:
- FuzzyQuery, WildCardQuery, PrefixQuery
- SpanFirstQuery
- SpanNotQuery
- MultiPhraseQuery (or the various phrase scorers),
- optional terms/clauses

FuzzyQuery and SpanFirstQuery could be done with a prefix operator
including a number (like the nn in the nnN near operator) followed by a
single query, with appropriate restrictions.
A prefix operator followed by  a single query is currently not present, but
relatively easy to add.
SpanNotQuery always has two subqueries, so would need an infix operator
only.
MultiPhraseQuery would need an infix operator and a prefix operator, just
like the N and W operators, and a restriction to terms, truncations and OR
as subqueries.

Left truncation could also be allowed,
truncations currently have to start with a normal character.
Truncation might also be left to WildCardQuery and
PrefixQuery instead of the current "equivalent" in Surround
that uses regular expressions to find the matching terms.

That leaves the optional terms/clauses, and I can't think of an easy way to
handle these. Any ideas? OR does not work for this because it requires
at least one. The normal QueryParser syntax for this is +aa bb cc,
where bb and cc are the optional parts.

Some control over performance is outside the language.
A basic query factory must be provided to the create a Lucene query
from a Surround query, and this throws an exception when
rewriting causes too many terms to be used,
much like the TooManyClauses for BooleanQuery.


Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/surround

Paul Elschot
How about putting this here:

http://wiki.apache.org/general/SummerOfCode2005

It seems to be a nice fit for the sponsor.

Regards,
Paul Elschot


On Saturday 04 June 2005 22:25, Paul Elschot wrote:

> On Monday 30 May 2005 02:44, Erik Hatcher wrote:
> > I concur with Daniel on this.  For the moment, my preference is to  
> > bring in Paul's parser into contrib/surround and let it gain some  
> > additional exposure there.  I don't believe its possible or even  
> > preferable to attempt to build one query parser to rule them all.  
> > While a decent general purpose one is handy, I'm finding that my  
> > projects really demand more custom parsing capabilities than the  
> > built-in QueryParser can handle and that the quirks of the current  
> > parser cause some frustrations sometimes.
> >
> > Perhaps over time, the built-in QueryParser can adopt some additional  
> > capabilities such as supporting the SpanQuery family but let's take  
> > that sort of thing slowly.
> >
>
> How about extending the surround parser to allow the use of all
> queries currently in Lucene? The goal would be to allow as many
> queries as possible.
>
> The queries not available in the current surround parser are:
> - FuzzyQuery, WildCardQuery, PrefixQuery
> - SpanFirstQuery
> - SpanNotQuery
> - MultiPhraseQuery (or the various phrase scorers),
> - optional terms/clauses
>
> FuzzyQuery and SpanFirstQuery could be done with a prefix operator
> including a number (like the nn in the nnN near operator) followed by a
> single query, with appropriate restrictions.
> A prefix operator followed by  a single query is currently not present, but
> relatively easy to add.
> SpanNotQuery always has two subqueries, so would need an infix operator
> only.
> MultiPhraseQuery would need an infix operator and a prefix operator, just
> like the N and W operators, and a restriction to terms, truncations and OR
> as subqueries.
>
> Left truncation could also be allowed,
> truncations currently have to start with a normal character.
> Truncation might also be left to WildCardQuery and
> PrefixQuery instead of the current "equivalent" in Surround
> that uses regular expressions to find the matching terms.
>
> That leaves the optional terms/clauses, and I can't think of an easy way to
> handle these. Any ideas? OR does not work for this because it requires
> at least one. The normal QueryParser syntax for this is +aa bb cc,
> where bb and cc are the optional parts.
>
> Some control over performance is outside the language.
> A basic query factory must be provided to the create a Lucene query
> from a Surround query, and this throws an exception when
> rewriting causes too many terms to be used,
> much like the TooManyClauses for BooleanQuery.
>
>
> Regards,
> Paul Elschot
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: contrib/surround

Erik Hatcher
Paul,

I'm swamped this weekend and all this coming week moving  
(physically).  I would be happy to mentor someone tackling these  
changes.  Could you go ahead and put your ideas on the wiki and list  
me as the ASF mentor? (I know it says ASF members and committers, but  
feel free to add it on my behalf).

     Erik

On Jun 5, 2005, at 5:07 AM, Paul Elschot wrote:

> How about putting this here:
>
> http://wiki.apache.org/general/SummerOfCode2005
>
> It seems to be a nice fit for the sponsor.
>
> Regards,
> Paul Elschot
>
>
> On Saturday 04 June 2005 22:25, Paul Elschot wrote:
>
>> On Monday 30 May 2005 02:44, Erik Hatcher wrote:
>>
>>> I concur with Daniel on this.  For the moment, my preference is to
>>> bring in Paul's parser into contrib/surround and let it gain some
>>> additional exposure there.  I don't believe its possible or even
>>> preferable to attempt to build one query parser to rule them all.
>>> While a decent general purpose one is handy, I'm finding that my
>>> projects really demand more custom parsing capabilities than the
>>> built-in QueryParser can handle and that the quirks of the current
>>> parser cause some frustrations sometimes.
>>>
>>> Perhaps over time, the built-in QueryParser can adopt some  
>>> additional
>>> capabilities such as supporting the SpanQuery family but let's take
>>> that sort of thing slowly.
>>>
>>>
>>
>> How about extending the surround parser to allow the use of all
>> queries currently in Lucene? The goal would be to allow as many
>> queries as possible.
>>
>> The queries not available in the current surround parser are:
>> - FuzzyQuery, WildCardQuery, PrefixQuery
>> - SpanFirstQuery
>> - SpanNotQuery
>> - MultiPhraseQuery (or the various phrase scorers),
>> - optional terms/clauses
>>
>> FuzzyQuery and SpanFirstQuery could be done with a prefix operator
>> including a number (like the nn in the nnN near operator) followed  
>> by a
>> single query, with appropriate restrictions.
>> A prefix operator followed by  a single query is currently not  
>> present, but
>> relatively easy to add.
>> SpanNotQuery always has two subqueries, so would need an infix  
>> operator
>> only.
>> MultiPhraseQuery would need an infix operator and a prefix  
>> operator, just
>> like the N and W operators, and a restriction to terms,  
>> truncations and OR
>> as subqueries.
>>
>> Left truncation could also be allowed,
>> truncations currently have to start with a normal character.
>> Truncation might also be left to WildCardQuery and
>> PrefixQuery instead of the current "equivalent" in Surround
>> that uses regular expressions to find the matching terms.
>>
>> That leaves the optional terms/clauses, and I can't think of an  
>> easy way to
>> handle these. Any ideas? OR does not work for this because it  
>> requires
>> at least one. The normal QueryParser syntax for this is +aa bb cc,
>> where bb and cc are the optional parts.
>>
>> Some control over performance is outside the language.
>> A basic query factory must be provided to the create a Lucene query
>> from a Surround query, and this throws an exception when
>> rewriting causes too many terms to be used,
>> much like the TooManyClauses for BooleanQuery.
>>
>>
>> Regards,
>> Paul Elschot
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]