Searching API: QueryParser vs Programatic queries

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Searching API: QueryParser vs Programatic queries

Irving, Dave
Hi,

Im very new to Lucene - so sorry if my question seems pretty dumb.

In the application Im writing, I've been "struggling with myself" over
whether I should be building up queries programatically, or using the
Query Parser.

My searchable fields are driven by meta-data, and I only want to support
a few query types. It seems "cleaner" to build the queries up
programatically rather than converting the query to a string and
throwing it through the QueryParser.

However, then we hit the problem that the QueryParser takes care of
Analysing the search strings - so to do this we'd have to write some
utility stuff to perform the analysis as we're building up the queries /
terms.

And then I think "might as well just use the QueryParser!".

So here's what Im wondering (which probably sounds very dumb to
experienced Lucene'rs):

- Is there maybe some room for more utility classes in Lucene which make
this easier? E.g: When building up a document, we don't have to worry
about running content through an analyser - but unless we use
QueryParser, there doesn't seem to be corresponding behaviour on the
search side.
- So, Im thinking some kind of factory / builder or something, where you
can register an Analyser (possibly a per field wrapper), and then it is
applied per field as the query is being built up programatically.

Maybe this is just an "extraction" refactoring to take this behaviour
out of QueryParser (which could delegate to it).

The result could be that more users opt for a programatic build up of
queries (because it's become easier to do..) rather than falling back on
QueryParser in cases where it may not be the best choice.


Sorry if I rambled too much :o)

Dave


This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching API: QueryParser vs Programatic queries

Otis Gospodnetic-2
Dave,
You said you are new to Lucene and you didn't mention this class explicitly, so you may not be aware of it yet: PerFieldAnalyzerWrapper.
It sounds like this may be what you are after.

Otis

----- Original Message ----
From: "Irving, Dave" <[hidden email]>
To: [hidden email]
Sent: Monday, May 22, 2006 5:15:23 AM
Subject: Searching API: QueryParser vs Programatic queries

Hi,

Im very new to Lucene - so sorry if my question seems pretty dumb.

In the application Im writing, I've been "struggling with myself" over
whether I should be building up queries programatically, or using the
Query Parser.

My searchable fields are driven by meta-data, and I only want to support
a few query types. It seems "cleaner" to build the queries up
programatically rather than converting the query to a string and
throwing it through the QueryParser.

However, then we hit the problem that the QueryParser takes care of
Analysing the search strings - so to do this we'd have to write some
utility stuff to perform the analysis as we're building up the queries /
terms.

And then I think "might as well just use the QueryParser!".

So here's what Im wondering (which probably sounds very dumb to
experienced Lucene'rs):

- Is there maybe some room for more utility classes in Lucene which make
this easier? E.g: When building up a document, we don't have to worry
about running content through an analyser - but unless we use
QueryParser, there doesn't seem to be corresponding behaviour on the
search side.
- So, Im thinking some kind of factory / builder or something, where you
can register an Analyser (possibly a per field wrapper), and then it is
applied per field as the query is being built up programatically.

Maybe this is just an "extraction" refactoring to take this behaviour
out of QueryParser (which could delegate to it).

The result could be that more users opt for a programatic build up of
queries (because it's become easier to do..) rather than falling back on
QueryParser in cases where it may not be the best choice.


Sorry if I rambled too much :o)

Dave


This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Searching API: QueryParser vs Programatic queries

Irving, Dave
In reply to this post by Irving, Dave
Hi Otis,

Thanks for your reply.
Yeah, Im aware of PerFieldAnalyserWrapper - and I think it could help in
the solution - but not on its own.
Here's what I mean:

When we build a document Field, we suppy either a String or a Reader.
The framework takes care of running the contents through an Analyser
(per field or otherwise) when we add the document to an index.

However, on the searching side of things, we don't have similar
functionality unless we use the QueryParser.
If we build queries programatically, then we have to make sure (by hand)
that we run search terms through the appropriate analyser whilst
constructing the query.

Its in this area that I wonder whether additional utility classes could
make programatic construction of queries somewhat easier.

Dave

> -----Original Message-----
> From: Otis Gospodnetic [mailto:[hidden email]]
> Sent: 22 May 2006 15:59
> To: [hidden email]
> Subject: Re: Searching API: QueryParser vs Programatic queries
>
> Dave,
> You said you are new to Lucene and you didn't mention this
> class explicitly, so you may not be aware of it yet:
> PerFieldAnalyzerWrapper.
> It sounds like this may be what you are after.
>
> Otis
>
> ----- Original Message ----
> From: "Irving, Dave" <[hidden email]>
> To: [hidden email]
> Sent: Monday, May 22, 2006 5:15:23 AM
> Subject: Searching API: QueryParser vs Programatic queries
>
> Hi,
>
> Im very new to Lucene - so sorry if my question seems pretty dumb.
>
> In the application Im writing, I've been "struggling with
> myself" over whether I should be building up queries
> programatically, or using the Query Parser.
>
> My searchable fields are driven by meta-data, and I only want
> to support a few query types. It seems "cleaner" to build the
> queries up programatically rather than converting the query
> to a string and throwing it through the QueryParser.
>
> However, then we hit the problem that the QueryParser takes
> care of Analysing the search strings - so to do this we'd
> have to write some utility stuff to perform the analysis as
> we're building up the queries / terms.
>
> And then I think "might as well just use the QueryParser!".
>
> So here's what Im wondering (which probably sounds very dumb
> to experienced Lucene'rs):
>
> - Is there maybe some room for more utility classes in Lucene
> which make this easier? E.g: When building up a document, we
> don't have to worry about running content through an analyser
> - but unless we use QueryParser, there doesn't seem to be
> corresponding behaviour on the search side.
> - So, Im thinking some kind of factory / builder or
> something, where you can register an Analyser (possibly a per
> field wrapper), and then it is applied per field as the query
> is being built up programatically.
>
> Maybe this is just an "extraction" refactoring to take this
> behaviour out of QueryParser (which could delegate to it).
>
> The result could be that more users opt for a programatic
> build up of queries (because it's become easier to do..)
> rather than falling back on QueryParser in cases where it may
> not be the best choice.
>
>
> Sorry if I rambled too much :o)
>
> Dave
>
>
> This e-mail and any attachment is for authorised use by the
> intended recipient(s) only. It may contain proprietary
> material, confidential information and/or be subject to legal
> privilege. It should not be copied, disclosed to, retained or
> used by, any other party. If you are not an intended
> recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching API: QueryParser vs Programatic queries

Raghavendra Prabhu
If i understand correctly, is it that you dont want to make use of query
parse?

You need to parse a query string without using query parser and construct
the query and still want an analyzer applied on the outcome search.


On 5/22/0 p6, Irving, Dave <[hidden email]> wrote:

> Hi Otis,
>
> Thanks for your reply.
> Yeah, Im aware of PerFieldAnalyserWrapper - and I think it could help in
> the solution - but not on its own.
> Here's what I mean:
>
> When we build a document Field, we suppy either a String or a Reader.
> The framework takes care of running the contents through an Analyser
> (per field or otherwise) when we add the document to an index.
>
> However, on the searching side of things, we don't have similar
> functionality unless we use the QueryParser.
> If we build queries programatically, then we have to make sure (by hand)
> that we run search terms through the appropriate analyser whilst
> constructing the query.
>
> Its in this area that I wonder whether additional utility classes could
> make programatic construction of queries somewhat easier.
>
> Dave
>
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:[hidden email]]
> > Sent: 22 May 2006 15:59
> > To: [hidden email]
> > Subject: Re: Searching API: QueryParser vs Programatic queries
> >
> > Dave,
> > You said you are new to Lucene and you didn't mention this
> > class explicitly, so you may not be aware of it yet:
> > PerFieldAnalyzerWrapper.
> > It sounds like this may be what you are after.
> >
> > Otis
> >
> > ----- Original Message ----
> > From: "Irving, Dave" <[hidden email]>
> > To: [hidden email]
> > Sent: Monday, May 22, 2006 5:15:23 AM
> > Subject: Searching API: QueryParser vs Programatic queries
> >
> > Hi,
> >
> > Im very new to Lucene - so sorry if my question seems pretty dumb.
> >
> > In the application Im writing, I've been "struggling with
> > myself" over whether I should be building up queries
> > programatically, or using the Query Parser.
> >
> > My searchable fields are driven by meta-data, and I only want
> > to support a few query types. It seems "cleaner" to build the
> > queries up programatically rather than converting the query
> > to a string and throwing it through the QueryParser.
> >
> > However, then we hit the problem that the QueryParser takes
> > care of Analysing the search strings - so to do this we'd
> > have to write some utility stuff to perform the analysis as
> > we're building up the queries / terms.
> >
> > And then I think "might as well just use the QueryParser!".
> >
> > So here's what Im wondering (which probably sounds very dumb
> > to experienced Lucene'rs):
> >
> > - Is there maybe some room for more utility classes in Lucene
> > which make this easier? E.g: When building up a document, we
> > don't have to worry about running content through an analyser
> > - but unless we use QueryParser, there doesn't seem to be
> > corresponding behaviour on the search side.
> > - So, Im thinking some kind of factory / builder or
> > something, where you can register an Analyser (possibly a per
> > field wrapper), and then it is applied per field as the query
> > is being built up programatically.
> >
> > Maybe this is just an "extraction" refactoring to take this
> > behaviour out of QueryParser (which could delegate to it).
> >
> > The result could be that more users opt for a programatic
> > build up of queries (because it's become easier to do..)
> > rather than falling back on QueryParser in cases where it may
> > not be the best choice.
> >
> >
> > Sorry if I rambled too much :o)
> >
> > Dave
> >
> >
> > This e-mail and any attachment is for authorised use by the
> > intended recipient(s) only. It may contain proprietary
> > material, confidential information and/or be subject to legal
> > privilege. It should not be copied, disclosed to, retained or
> > used by, any other party. If you are not an intended
> > recipient then please promptly delete this e-mail and any
> > attachment and all copies and inform the sender. Thank you.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Searching API: QueryParser vs Programatic queries

Irving, Dave
In reply to this post by Irving, Dave
> You need to parse a query string without using query parser and
> construct the query and still want an analyzer applied on the outcome
search

Not quite. The user is presented with a list of (UI) fields, and each
field already knows whether its an "OR" "AND" etc.
So, there is no query String as such.
For this reason, it seems to make more sense to build the query up
programmatically - as my field meta data can drive this.
However, if I do that, I have to do the work of extracting terms by
running through an analyser for each field manually.
This is also done by the query parser.

So, right now, if Im being lazy, the easiest thing to do is construct a
query string based on the meta data, and then run that through the query
parser. This just doesn't -- feel right -- from a design perspective
though :o)

The logic I could see being extracted out would be some of the stuff in
QueryParser#getFieldQuery(String field, String queryText).


> -----Original Message-----
> From: Raghavendra Prabhu [mailto:[hidden email]]
> Sent: 22 May 2006 16:17
> To: [hidden email]
> Subject: Re: Searching API: QueryParser vs Programatic queries
>
> If i understand correctly, is it that you dont want to make
> use of query parse?
>
> You need to parse a query string without using query parser
> and construct the query and still want an analyzer applied on
> the outcome search.
>
>
> On 5/22/0 p6, Irving, Dave <[hidden email]> wrote:
>
> > Hi Otis,
> >
> > Thanks for your reply.
> > Yeah, Im aware of PerFieldAnalyserWrapper - and I think it
> could help
> > in the solution - but not on its own.
> > Here's what I mean:
> >
> > When we build a document Field, we suppy either a String or
> a Reader.
> > The framework takes care of running the contents through an
> Analyser
> > (per field or otherwise) when we add the document to an index.
> >
> > However, on the searching side of things, we don't have similar
> > functionality unless we use the QueryParser.
> > If we build queries programatically, then we have to make sure (by
> > hand) that we run search terms through the appropriate
> analyser whilst
> > constructing the query.
> >
> > Its in this area that I wonder whether additional utility classes
> > could make programatic construction of queries somewhat easier.
> >
> > Dave
> >
> > > -----Original Message-----
> > > From: Otis Gospodnetic [mailto:[hidden email]]
> > > Sent: 22 May 2006 15:59
> > > To: [hidden email]
> > > Subject: Re: Searching API: QueryParser vs Programatic queries
> > >
> > > Dave,
> > > You said you are new to Lucene and you didn't mention this class
> > > explicitly, so you may not be aware of it yet:
> > > PerFieldAnalyzerWrapper.
> > > It sounds like this may be what you are after.
> > >
> > > Otis
> > >
> > > ----- Original Message ----
> > > From: "Irving, Dave" <[hidden email]>
> > > To: [hidden email]
> > > Sent: Monday, May 22, 2006 5:15:23 AM
> > > Subject: Searching API: QueryParser vs Programatic queries
> > >
> > > Hi,
> > >
> > > Im very new to Lucene - so sorry if my question seems pretty dumb.
> > >
> > > In the application Im writing, I've been "struggling with myself"
> > > over whether I should be building up queries programatically, or
> > > using the Query Parser.
> > >
> > > My searchable fields are driven by meta-data, and I only want to
> > > support a few query types. It seems "cleaner" to build
> the queries
> > > up programatically rather than converting the query to a
> string and
> > > throwing it through the QueryParser.
> > >
> > > However, then we hit the problem that the QueryParser
> takes care of
> > > Analysing the search strings - so to do this we'd have to
> write some
> > > utility stuff to perform the analysis as we're building up the
> > > queries / terms.
> > >
> > > And then I think "might as well just use the QueryParser!".
> > >
> > > So here's what Im wondering (which probably sounds very dumb to
> > > experienced Lucene'rs):
> > >
> > > - Is there maybe some room for more utility classes in
> Lucene which
> > > make this easier? E.g: When building up a document, we
> don't have to
> > > worry about running content through an analyser
> > > - but unless we use QueryParser, there doesn't seem to be
> > > corresponding behaviour on the search side.
> > > - So, Im thinking some kind of factory / builder or
> something, where
> > > you can register an Analyser (possibly a per field wrapper), and
> > > then it is applied per field as the query is being built up
> > > programatically.
> > >
> > > Maybe this is just an "extraction" refactoring to take this
> > > behaviour out of QueryParser (which could delegate to it).
> > >
> > > The result could be that more users opt for a programatic
> build up
> > > of queries (because it's become easier to do..) rather
> than falling
> > > back on QueryParser in cases where it may not be the best choice.
> > >
> > >
> > > Sorry if I rambled too much :o)
> > >
> > > Dave
> > >
> > >
> > > This e-mail and any attachment is for authorised use by
> the intended
> > > recipient(s) only. It may contain proprietary material,
> confidential
> > > information and/or be subject to legal privilege. It
> should not be
> > > copied, disclosed to, retained or used by, any other
> party. If you
> > > are not an intended recipient then please promptly delete this
> > > e-mail and any attachment and all copies and inform the sender.
> > > Thank you.
> > >
> > >
> --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> > >
> > >
> > >
> > >
> --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching API: QueryParser vs Programatic queries

Marvin Humphrey

On May 22, 2006, at 8:44 AM, Irving, Dave wrote:

> So, right now, if Im being lazy, the easiest thing to do is  
> construct a
> query string based on the meta data, and then run that through the  
> query
> parser. This just doesn't -- feel right -- from a design perspective
> though :o)

How about building a larger BooleanQuery by combining the output of  
the QueryParser with custom-built Query objects based on your metadata?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching API: QueryParser vs Programatic queries

jjlarrea
In reply to this post by Irving, Dave
At 10:15 AM +0100 5/22/06, Irving, Dave wrote:

>- Is there maybe some room for more utility classes in Lucene which make
>this easier? E.g: When building up a document, we don't have to worry
>about running content through an analyser - but unless we use
>QueryParser, there doesn't seem to be corresponding behaviour on the
>search side.
>- So, Im thinking some kind of factory / builder or something, where you
>can register an Analyser (possibly a per field wrapper), and then it is
>applied per field as the query is being built up programatically.
>
>Maybe this is just an "extraction" refactoring to take this behaviour
>out of QueryParser (which could delegate to it).
>
>The result could be that more users opt for a programatic build up of
>queries (because it's become easier to do..) rather than falling back on
>QueryParser in cases where it may not be the best choice.

I concur with your thoughts that there is room for such utility classes, and that those would increase the use of programmatic queries.  I say this as a developer who also "lazed out" and opted to simply construct a string and let the QP do all the work (but who then had to subclass and finally copy-and-modify QP to make it conform to requirements).

The underlying issue may be that there are two quite different concerns bundled into QueryParser:
 - Parsing a string into a set of discrete query requests
 - Constructing Query objects to meet those requests

If you take a look at http://issues.apache.org/jira/browse/LUCENE-344 you'll see that someone else (Matthew Denner) also had this belief, and went so far as to implement a QueryFactory interface and a couple of implementing classes.  One has the construction logic now found in QueryParser.  Then there is a decorator class which adds the functionality of MultiFieldQueryParser and another which lower-cases terms.

Perhaps something along those lines that should be considered for the next break in API continuity eg. Lucene 2.0.  It seems much cleaner than subclassing QP when all that is needed is a variant in Query construction logic, and it also provides a higher-level interface for constructing Query objects (especially TermQuery) like you were proposing.  Unfortunately the actual LUCENE-344 patch appears out of date with changes in QueryParser, MultiFieldQueryParser, etc.  But perhaps just the QueryFactory part would be a good starting point for what you want to do.

Anyway, just a thought.

- J.J.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching API: QueryParser vs Programatic queries

Erick Erickson
In reply to this post by Irving, Dave
There's a long scree that I'm leaving at the bottom because I put effort
into it and I like to rant. But here's, perhaps, an approach.

Maybe I'm mis-interpreting what you're trying to do. I'm assuming that you
have several search fields (I'm not exactly sure what "driven by meta-data"
means in this case, but what the heck).

It seems to me that you can always do something like:

BooleanQuery bq;
QueryParser qp1 = new QueryParser("field1", "<your query fragment here>",
analyzer);
Query q1 = qp1.parse("search term or clause);
bq.add(q1,,,);

QueryParser qp2 = new QueryParser("field2", "<your query fragment here>",
analyzer);
Query q2 = qp2.parse("search term or clause);
bq.add(q2....);

.
.
.

and eventually submit the query you've build up in bq.

You can arbitrarily build these up. In other words, your q1, q2, q3, etc can
be the same field for the first N clauses, and another field for the second
M clauses. Or you could build up the <query fragment> to consist of all the
terms for a particular field.


As I said, I have no clue whether this is possible in your application. If
not, see below <G>.

********************Scree starts here***********************************

I've had similar arguments with myself. But I'm getting less forgiving with
myself when I reinvent wheels, and firmly slap my own wrists.

Pretend you are talking to your boss/technical lead/coworker. I'm assuming
you actually want to get a product out the door. Your manager asks: "How can
you justify spending the time to create, debug and maintain code that has
already been written for you for the sake of cleanliness at the expense of
the other things you could be contributing instead"?

There are some very good answers to this, but most of the ones I've tried to
use involve a lot of hand-waving on the order of "If we ever extend the
application......", or "It would be cleaner".  At which point the
conversation *should* go something like this....

Manager: "let me get this straight. You can spend 10 minutes right now
implementing the pass-to-the-query-parser solution and an unknown amount
(but probably way more than your initial estimate)
implementing/debugging/testing a 'cleaner' solution. Is that right?"

You: "Yes but....."

Manager: "Furthermore, the functionality you want to add is *already* built
into the 'use-the-parser' solution, right?"

You: "Yes, but...."

Manager: "And the amount of time you'll spend debugging this, not to mention
the amount of *other* people's time you'll spend identifying any bugs and
figuring out that it's in this new code will only increase as the longer any
bugs to undetected, right?

You: "Yes, but..."

Manager: "Do it the use-the-parser way. We can always implement it the other
way if we have time. It doesn't cost us *any* time to implement the 'use the
query parser' way, whereas your way has a measurable cost now, an unknown
cost in the future and no measurable gain. Add a big comment if you want
about how I forced you to do this ugly thing.....".

Of course there are good reasons to take the time now *if* it will save
time/effort in the future. But this sure doesn't seem like one of those
situations to me. Not to mention that it'll be MUCH simpler for the next
person looking at it to understand. Here are several things off the top of
my head that'll become maintenance issues for a custom solution, that are
*all* taken care of by the use-the-parser solution

1> How are you going to handle stop words?
2> Will you ever want to change analyzers to, say, keep URLs together? Or
maybe break them up?
3> What happens if you want to use the RegularExpressionAnalyzer to, say,
remove all punctuation or other user-entered junk?
4> Will you remember all the ins-and-outs of this code in even 1 month? What
about the next poor joker who has to figure it out?

None of this is to say that your suggestion that there be utility classes
that allow this sort of thing doesn't have merit. But I have to wonder
whether it would be effort well spent for you at this time, in this project
<G>.

As you can see, this is one of my hot-button issues <G>. If you want to
really see me go off the deep end, just *mention* premature
optimizations........

Best
Erick
Reply | Threaded
Open this post in threaded view
|

RE: Searching API: QueryParser vs Programatic queries

Chris Hostetter-3
In reply to this post by Irving, Dave

: Not quite. The user is presented with a list of (UI) fields, and each
: field already knows whether its an "OR" "AND" etc.
: So, there is no query String as such.
: For this reason, it seems to make more sense to build the query up
: programmatically - as my field meta data can drive this.
: However, if I do that, I have to do the work of extracting terms by
: running through an analyser for each field manually.
: This is also done by the query parser.

typically, when build queries up from form data, each piece of data falls
into one of 2 categories:

  1) data which doesn't need analyzed because the field it's going to
     query on wasn't tokenized (ie: a date field, or a numeric field, or a
     boolean field)
  2) data whcih is typed by the user in a text box, and not only needs
     analyzed, but may also need some parsing (ie: to support "quoted
     phrases" or +mandatory and -prohibited terms)

in the first case, build your query clauses progromatically.

in the second case make a QueryParser on the fly with the defaultField set
to whatever makes sense and let it handle parsing the users text (and
applying hte correct analyzer using PerFieldAnalyzer.  if there are
special characters you want it to ignore, then escape them first.

i discussed this a little more recently...

http://www.nabble.com/RE%3A+Building+queries-t1635907.html#a4436416



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

FW: Searching API: QueryParser vs Programatic queries

Irving, Dave
In reply to this post by Irving, Dave
Erick Erickson wrote:

...

> It seems to me that you can always do something like:
> BooleanQuery bq;
> QueryParser qp1 = new QueryParser("field1", "<your query fragment
here>", analyzer);
> Query q1 = qp1.parse("search term or clause); bq.add(q1,,,);
> QueryParser qp2 = new QueryParser("field2", "<your query fragment
here>", analyzer);
> Query q2 = qp2.parse("search term or clause); bq.add(q2....);


> and eventually submit the query you've build up in bq.

<snip/>

Thanks for the idea - someone else also mentioned it yesterday, and I
think its possibly the way I'll go.
The only problem I have with this is that QueryParser also parses out
operators in the queries. No problem - I could just override the
relevant factory methods and throw an exception to indicate the operator
isn't supported.
However, QueryParser also parses out operators like "+" etc - which I
(and my Analyser) may want to include in the search (for example, I
might be searching for C++). So then, I've also got to escape these in
the user query... And so the work mounts up :o)

> As I said, I have no clue whether this is possible in your
application. If not, see below <G>.

> ********************Scree starts
here***********************************

> I've had similar arguments with myself. But I'm getting less forgiving
with myself
> when I reinvent wheels, and firmly slap my own wrists.

> Pretend you are talking to your boss/technical lead/coworker. I'm
assuming you actually
> want to get a product out the door. Your manager asks: "How can you
justify spending
> the time to create, debug and maintain code that has already been
written for you for
> the sake of cleanliness at the expense of the other things you could
be contributing instead"?

In this instance, I have the luxuary that this project is something Im
doing in my own time as a hobby. I can therefore afford time to mull
over my design, and maybe even contribute something back to the lucene
community in the process. After all, I owe Lucene big time :o)

<snip/>

> As you can see, this is one of my hot-button issues <G>.

:o)

> If you want to really see me go off the deep end,
> just *mention* premature optimizations........

I'd have to agree with you on that one.... :o)

> Best
> Erick


Dave


This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Searching API: QueryParser vs Programatic queries

Irving, Dave
In reply to this post by Irving, Dave
J.J. Larrea wrote:

> I concur with your thoughts that there is room for such
> utility classes, and that those would increase the use of
> programmatic queries.  I say this as a developer who also
> "lazed out" and opted to simply construct a string and let
> the QP do all the work (but who then had to subclass and
> finally copy-and-modify QP to make it conform to requirements).
>
> The underlying issue may be that there are two quite
> different concerns bundled into QueryParser:
>  - Parsing a string into a set of discrete query requests
>  - Constructing Query objects to meet those requests

Yeah - I agree. It seems that QueryParser has too many responsibilities.

> If you take a look at
> http://issues.apache.org/jira/browse/LUCENE-344 you'll see
> that someone else (Matthew Denner) also had this belief, and
> went so far as to implement a QueryFactory interface and a
> couple of implementing classes.  One has the construction
> logic now found in QueryParser.  Then there is a decorator
> class which adds the functionality of MultiFieldQueryParser
> and another which lower-cases terms.

Thanks for the link - I'll certainly take a look at that patch.

> Perhaps something along those lines that should be considered
> for the next break in API continuity eg. Lucene 2.0.  It
> seems much cleaner than subclassing QP when all that is
> needed is a variant in Query construction logic, and it also
> provides a higher-level interface for constructing Query
> objects (especially TermQuery) like you were proposing.  
> Unfortunately the actual LUCENE-344 patch appears out of date
> with changes in QueryParser, MultiFieldQueryParser, etc.  But
> perhaps just the QueryFactory part would be a good starting
> point for what you want to do.
>
> Anyway, just a thought.

Many thanks for the ideas

> - J.J.

Dave


This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Searching API: QueryParser vs Programatic queries

Irving, Dave
In reply to this post by Irving, Dave
Chris Hostetter wrote:

> typically, when build queries up from form data, each piece
> of data falls into one of 2 categories:
>
>   1) data which doesn't need analyzed because the field it's going to
>      query on wasn't tokenized (ie: a date field, or a
> numeric field, or a
>      boolean field)

But couldn't even these simple field types also need analysing?
E.g, I could have a very simple number field - so I need to pad it at
both index and search time to a certain number of characters. Isn't a
custom TokenFilter added to an analyser a reasonable way to handle such
fields (keeping the logic of how to transform input into Term all in one
place).

>   2) data whcih is typed by the user in a text box, and not only needs
>      analyzed, but may also need some parsing (ie: to support "quoted
>      phrases" or +mandatory and -prohibited terms)
>
> in the first case, build your query clauses progromatically.
>
> in the second case make a QueryParser on the fly with the
> defaultField set to whatever makes sense and let it handle
> parsing the users text (and applying hte correct analyzer
> using PerFieldAnalyzer.  if there are special characters you
> want it to ignore, then escape them first.
 
Yeah, one of my fields is "keywords" - in to which the user can type a
list of terms - all of which need to be analysed.
It seems all I logically want to do is extract the terms from the input
- running each through an analyser - and then combining them in to a
boolean query.
However, a typical search will be "C++" - so Im also going to have to
escape the content - because Im using a QueryParser.

I certainly agree that I can accomplish what I want by escaping, running
thru the query parser and combining using boolean queries.

I think the query factory patch (LUCENE-344) is pretty close to what I
was trying to get at.
The ability to say the following would be really cool:

Query keywordsQuery = queryFactory.getFieldQuery("keywords",
someAnalyser, keyworkdsText);

The factory lets us get to the guts of running the content through the
analyser, and extracting a query in various ways - without having to do
extra work (escaping, overriding unsupported methods) so that we can
achieve the same goal with the Query Parser. Seems like a nice
Separation of Concerns.

The QueryParser then adds the -- parsing -- on top of this, but can
delegate for query delegation.

> i discussed this a little more recently...
>
> http://www.nabble.com/RE%3A+Building+queries-t1635907.html#a4436416
>

Indeed. I apologise - I should have stuck with that original thread.
Sorry for the carelessness.

>
> -Hoss
>

Dave
 


This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Searching API: QueryParser vs Programatic queries

Irving, Dave
In reply to this post by Irving, Dave
> The QueryParser then adds the -- parsing -- on top of this, but can
delegate for query delegation.

That sould be "query creation", of course.
 

> -----Original Message-----
> From: Irving, Dave [mailto:[hidden email]]
> Sent: 23 May 2006 08:30
> To: [hidden email]
> Subject: RE: Searching API: QueryParser vs Programatic queries
>
> Chris Hostetter wrote:
>
> > typically, when build queries up from form data, each piece of data
> > falls into one of 2 categories:
> >
> >   1) data which doesn't need analyzed because the field
> it's going to
> >      query on wasn't tokenized (ie: a date field, or a
> numeric field,
> > or a
> >      boolean field)
>
> But couldn't even these simple field types also need analysing?
> E.g, I could have a very simple number field - so I need to
> pad it at both index and search time to a certain number of
> characters. Isn't a custom TokenFilter added to an analyser a
> reasonable way to handle such fields (keeping the logic of
> how to transform input into Term all in one place).
>
> >   2) data whcih is typed by the user in a text box, and not
> only needs
> >      analyzed, but may also need some parsing (ie: to
> support "quoted
> >      phrases" or +mandatory and -prohibited terms)
> >
> > in the first case, build your query clauses progromatically.
> >
> > in the second case make a QueryParser on the fly with the
> defaultField
> > set to whatever makes sense and let it handle parsing the
> users text
> > (and applying hte correct analyzer using PerFieldAnalyzer.  
> if there
> > are special characters you want it to ignore, then escape
> them first.
>  
> Yeah, one of my fields is "keywords" - in to which the user
> can type a list of terms - all of which need to be analysed.
> It seems all I logically want to do is extract the terms from
> the input
> - running each through an analyser - and then combining them
> in to a boolean query.
> However, a typical search will be "C++" - so Im also going to
> have to escape the content - because Im using a QueryParser.
>
> I certainly agree that I can accomplish what I want by
> escaping, running thru the query parser and combining using
> boolean queries.
>
> I think the query factory patch (LUCENE-344) is pretty close
> to what I was trying to get at.
> The ability to say the following would be really cool:
>
> Query keywordsQuery = queryFactory.getFieldQuery("keywords",
> someAnalyser, keyworkdsText);
>
> The factory lets us get to the guts of running the content
> through the analyser, and extracting a query in various ways
> - without having to do extra work (escaping, overriding
> unsupported methods) so that we can achieve the same goal
> with the Query Parser. Seems like a nice Separation of Concerns.
>
> The QueryParser then adds the -- parsing -- on top of this,
> but can delegate for query delegation.
>
> > i discussed this a little more recently...
> >
> > http://www.nabble.com/RE%3A+Building+queries-t1635907.html#a4436416
> >
>
> Indeed. I apologise - I should have stuck with that original thread.
> Sorry for the carelessness.
>
> >
> > -Hoss
> >
>
> Dave
>  
>
>
> This e-mail and any attachment is for authorised use by the
> intended recipient(s) only. It may contain proprietary
> material, confidential information and/or be subject to legal
> privilege. It should not be copied, disclosed to, retained or
> used by, any other party. If you are not an intended
> recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]