QueryParser explicit and implicit search operator

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

QueryParser explicit and implicit search operator

Karimi-Tabatabaie Jamal
> hello,

>
> I have a problem with the QueryParser and the default search operator
> AND.
>
> So let me please explain my problem in hope that you can help me.
>
> I have integrated the search engine in our CRM product. To make it
> easier for the user we decided to set the default search operator to
> 'AND'. Now we discovered that search strings that contain OR do not
> parsed as expected:
>
> When we search after 'Hare AND Tortoise' the QueryParser parses it
> correct to 'Hare AND Tortoise'.
> But when we search after 'Hare OR Tortoise' the QueryParser parses it
> again to 'Hare AND Tortoise'!
> In both cases all hits contain both search trems.
>
> So I played a little around but I have no idea how to solved this.
>
For my problem there seems to exist a lucence Bug
(http://issues.apache.org/jira/browse/LUCENE-167) but also it seems that
it's solved in the Lucene integration on site http://www.lucenebook.com.

> I put the relevant source code to the attachment if you like to review
> the short methods search() and buildQuery(QueryParser parser). I use
> lucene 2.0.
>
>   <<MultiSearch.java>>
> If you would please give me any hints or tips, I'll be grateful for
> your information.
>
>
> Yours sincerely,
>
> Jamal

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: QueryParser explicit and implicit search operator

Erik Hatcher

On Feb 16, 2007, at 1:53 PM, Karimi-Tabatabaie Jamal wrote:
> For my problem there seems to exist a lucence Bug
> (http://issues.apache.org/jira/browse/LUCENE-167) but also it seems  
> that
> it's solved in the Lucene integration on site http://
> www.lucenebook.com.

Where do you see the problem solved at lucenebook.com?   The "Query  
parsed to:" for this query <http://www.lucenebook.com/search?
query=aaaaa+AND+bbbbb+OR+ccccc+AND+ddddd> is

        Query parsed to: +aaaaa bbbbb +ccccc +ddddd

The OR caused the "bbbbb" term to not be required, yet all other  
terms are required.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

AW: QueryParser explicit and implicit search operator

Karimi-Tabatabaie Jamal
Hello Erik,

You are right for a little complex query. I tested  the following queries

serach at lucenebool.com with query "Query" 155 results
serach at lucenebool.com with query "Lucene" 270 results
serach at lucenebool.com with query "Query AND Lucene" 109 results
serach at lucenebool.com with query "Query Lucene" 109 results
serach at lucenebool.com with query "Query OR Lucene" 316 results

and came to the guess that it seems to be fixed for a query like "X OR Y"! How else would you explain the 316 search results, when the results for each single queries "Lucene" and "Query" is lesser than 316?

Jamal

-----Ursprüngliche Nachricht-----
Von: Erik Hatcher [mailto:[hidden email]]
Gesendet: Samstag, 17. Februar 2007 06:27
An: [hidden email]
Betreff: Re: QueryParser explicit and implicit search operator


On Feb 16, 2007, at 1:53 PM, Karimi-Tabatabaie Jamal wrote:
> For my problem there seems to exist a lucence Bug
> (http://issues.apache.org/jira/browse/LUCENE-167) but also it seems  
> that
> it's solved in the Lucene integration on site http://
> www.lucenebook.com.

Where do you see the problem solved at lucenebook.com?   The "Query  
parsed to:" for this query <http://www.lucenebook.com/search?
query=aaaaa+AND+bbbbb+OR+ccccc+AND+ddddd> is

        Query parsed to: +aaaaa bbbbb +ccccc +ddddd

The OR caused the "bbbbb" term to not be required, yet all other  
terms are required.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: AW: QueryParser explicit and implicit search operator

Erik Hatcher

On Feb 19, 2007, at 8:26 AM, Karimi-Tabatabaie Jamal wrote:

> You are right for a little complex query. I tested  the following  
> queries
>
> serach at lucenebool.com with query "Query" 155 results
> serach at lucenebool.com with query "Lucene" 270 results
> serach at lucenebool.com with query "Query AND Lucene" 109 results
> serach at lucenebool.com with query "Query Lucene" 109 results
> serach at lucenebool.com with query "Query OR Lucene" 316 results
>
> and came to the guess that it seems to be fixed for a query like "X  
> OR Y"! How else would you explain the 316 search results, when the  
> results for each single queries "Lucene" and "Query" is lesser than  
> 316?


We have the default operator set to AND for the QueryParser.  So  
"Query Lucene" (without quotes) is the same as "Query AND Lucene".  
Explicit ORs override this default.

        Erik


>
> Jamal
>
> -----Ursprüngliche Nachricht-----
> Von: Erik Hatcher [mailto:[hidden email]]
> Gesendet: Samstag, 17. Februar 2007 06:27
> An: [hidden email]
> Betreff: Re: QueryParser explicit and implicit search operator
>
>
> On Feb 16, 2007, at 1:53 PM, Karimi-Tabatabaie Jamal wrote:
>> For my problem there seems to exist a lucence Bug
>> (http://issues.apache.org/jira/browse/LUCENE-167) but also it seems
>> that
>> it's solved in the Lucene integration on site http://
>> www.lucenebook.com.
>
> Where do you see the problem solved at lucenebook.com?   The "Query
> parsed to:" for this query <http://www.lucenebook.com/search?
> query=aaaaa+AND+bbbbb+OR+ccccc+AND+ddddd> is
>
> Query parsed to: +aaaaa bbbbb +ccccc +ddddd
>
> The OR caused the "bbbbb" term to not be required, yet all other
> terms are required.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: QueryParser explicit and implicit search operator

Erick Erickson
In reply to this post by Karimi-Tabatabaie Jamal
<<<How else would you explain the 316 search results, when the results for
each single queries "Lucene" and "Query" is lesser than 316?>>>
Because some documents contain "Lucene" but not "Query" and vice-versa.

These results look perfectly reasonable to me too. The default operator is
AND, which is why queries 3 and 4 return the same results.

When you query on "Query" you get some number of documents (109 to be exact)
that ALSO contain "Lucene". The reverse is also true, when you query
"Lucene", 109 of those documents also contain "Query".

So, if you subtract 109 (look familiar?) from the sum of the individual
queries for Lucene and Query, you get 316 (155 + 270 - 109) which is exactly
what you get from "Lucene OR Query". All fine from my perspective.

Best
Erick

On 2/19/07, Karimi-Tabatabaie Jamal <[hidden email]> wrote:

>
> Hello Erik,
>
> You are right for a little complex query. I tested  the following queries
>
> serach at lucenebool.com with query "Query" 155 results
> serach at lucenebool.com with query "Lucene" 270 results
> serach at lucenebool.com with query "Query AND Lucene" 109 results
> serach at lucenebool.com with query "Query Lucene" 109 results
> serach at lucenebool.com with query "Query OR Lucene" 316 results
>
> and came to the guess that it seems to be fixed for a query like "X OR Y"!
> How else would you explain the 316 search results, when the results for each
> single queries "Lucene" and "Query" is lesser than 316?
>
> Jamal
>
> -----Ursprüngliche Nachricht-----
> Von: Erik Hatcher [mailto:[hidden email]]
> Gesendet: Samstag, 17. Februar 2007 06:27
> An: [hidden email]
> Betreff: Re: QueryParser explicit and implicit search operator
>
>
> On Feb 16, 2007, at 1:53 PM, Karimi-Tabatabaie Jamal wrote:
> > For my problem there seems to exist a lucence Bug
> > (http://issues.apache.org/jira/browse/LUCENE-167) but also it seems
> > that
> > it's solved in the Lucene integration on site http://
> > www.lucenebook.com.
>
> Where do you see the problem solved at lucenebook.com?   The "Query
> parsed to:" for this query <http://www.lucenebook.com/search?
> query=aaaaa+AND+bbbbb+OR+ccccc+AND+ddddd> is
>
>         Query parsed to: +aaaaa bbbbb +ccccc +ddddd
>
> The OR caused the "bbbbb" term to not be required, yet all other
> terms are required.
>
>         Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

AW: QueryParser explicit and implicit search operator

Karimi-Tabatabaie Jamal
Exactly.

That means that  last query "Lucene OR Query" must be parsed correct to "Lucene query" with the OR operator while default operator is set to AND. Right?

But in my implementation with default operator set to AND the query 'Hare OR Tortoise' is parsed to 'Hare AND Tortoise'!

Regards,
Jamal
-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:[hidden email]]
Gesendet: Montag, 19. Februar 2007 14:45
An: [hidden email]
Betreff: Re: QueryParser explicit and implicit search operator

<<<How else would you explain the 316 search results, when the results for
each single queries "Lucene" and "Query" is lesser than 316?>>>
Because some documents contain "Lucene" but not "Query" and vice-versa.

These results look perfectly reasonable to me too. The default operator is
AND, which is why queries 3 and 4 return the same results.

When you query on "Query" you get some number of documents (109 to be exact)
that ALSO contain "Lucene". The reverse is also true, when you query
"Lucene", 109 of those documents also contain "Query".

So, if you subtract 109 (look familiar?) from the sum of the individual
queries for Lucene and Query, you get 316 (155 + 270 - 109) which is exactly
what you get from "Lucene OR Query". All fine from my perspective.

Best
Erick

On 2/19/07, Karimi-Tabatabaie Jamal <[hidden email]> wrote:

>
> Hello Erik,
>
> You are right for a little complex query. I tested  the following queries
>
> serach at lucenebool.com with query "Query" 155 results
> serach at lucenebool.com with query "Lucene" 270 results
> serach at lucenebool.com with query "Query AND Lucene" 109 results
> serach at lucenebool.com with query "Query Lucene" 109 results
> serach at lucenebool.com with query "Query OR Lucene" 316 results
>
> and came to the guess that it seems to be fixed for a query like "X OR Y"!
> How else would you explain the 316 search results, when the results for each
> single queries "Lucene" and "Query" is lesser than 316?
>
> Jamal
>
> -----Ursprüngliche Nachricht-----
> Von: Erik Hatcher [mailto:[hidden email]]
> Gesendet: Samstag, 17. Februar 2007 06:27
> An: [hidden email]
> Betreff: Re: QueryParser explicit and implicit search operator
>
>
> On Feb 16, 2007, at 1:53 PM, Karimi-Tabatabaie Jamal wrote:
> > For my problem there seems to exist a lucence Bug
> > (http://issues.apache.org/jira/browse/LUCENE-167) but also it seems
> > that
> > it's solved in the Lucene integration on site http://
> > www.lucenebook.com.
>
> Where do you see the problem solved at lucenebook.com?   The "Query
> parsed to:" for this query <http://www.lucenebook.com/search?
> query=aaaaa+AND+bbbbb+OR+ccccc+AND+ddddd> is
>
>         Query parsed to: +aaaaa bbbbb +ccccc +ddddd
>
> The OR caused the "bbbbb" term to not be required, yet all other
> terms are required.
>
>         Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: QueryParser explicit and implicit search operator

Erick Erickson
You still haven't provided the data. Here's what Mr. Hatcher needs you to
do.

Provide a very short program that demonstrates this. It should parse a query
and print out the parsed query using ToString. For instance, are you
lowercasing the query before parsing? In which case or is not treated as an
operator, but a term.

We can keep going back and forth only to discover that you are not providing
complete information. Or we're assuming something you aren't. or.... the
10,000 ways we can talk past each other without providing crucial
information.

So, please provide a short, self-contained program that demonstrates the
problem and I'm sure Mr. Hatcher will provide an answer in a trice.

I'm going to guess that you won't be able to do this, and in the process
you'll discover that you did something like... instantiated a new
QueryParser and didn't set the default operator as you think. Or lowercased
the query. Or set the default operator on a different QueryParser than you
actually used. Or......

Best
Erick

On 2/19/07, Karimi-Tabatabaie Jamal <[hidden email]> wrote:

>
> Exactly.
>
> That means that  last query "Lucene OR Query" must be parsed correct to
> "Lucene query" with the OR operator while default operator is set to AND.
> Right?
>
> But in my implementation with default operator set to AND the query 'Hare
> OR Tortoise' is parsed to 'Hare AND Tortoise'!
>
> Regards,
> Jamal
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:[hidden email]]
> Gesendet: Montag, 19. Februar 2007 14:45
> An: [hidden email]
> Betreff: Re: QueryParser explicit and implicit search operator
>
> <<<How else would you explain the 316 search results, when the results for
> each single queries "Lucene" and "Query" is lesser than 316?>>>
> Because some documents contain "Lucene" but not "Query" and vice-versa.
>
> These results look perfectly reasonable to me too. The default operator is
> AND, which is why queries 3 and 4 return the same results.
>
> When you query on "Query" you get some number of documents (109 to be
> exact)
> that ALSO contain "Lucene". The reverse is also true, when you query
> "Lucene", 109 of those documents also contain "Query".
>
> So, if you subtract 109 (look familiar?) from the sum of the individual
> queries for Lucene and Query, you get 316 (155 + 270 - 109) which is
> exactly
> what you get from "Lucene OR Query". All fine from my perspective.
>
> Best
> Erick
>
> On 2/19/07, Karimi-Tabatabaie Jamal <[hidden email]> wrote:
> >
> > Hello Erik,
> >
> > You are right for a little complex query. I tested  the following
> queries
> >
> > serach at lucenebool.com with query "Query" 155 results
> > serach at lucenebool.com with query "Lucene" 270 results
> > serach at lucenebool.com with query "Query AND Lucene" 109 results
> > serach at lucenebool.com with query "Query Lucene" 109 results
> > serach at lucenebool.com with query "Query OR Lucene" 316 results
> >
> > and came to the guess that it seems to be fixed for a query like "X OR
> Y"!
> > How else would you explain the 316 search results, when the results for
> each
> > single queries "Lucene" and "Query" is lesser than 316?
> >
> > Jamal
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Erik Hatcher [mailto:[hidden email]]
> > Gesendet: Samstag, 17. Februar 2007 06:27
> > An: [hidden email]
> > Betreff: Re: QueryParser explicit and implicit search operator
> >
> >
> > On Feb 16, 2007, at 1:53 PM, Karimi-Tabatabaie Jamal wrote:
> > > For my problem there seems to exist a lucence Bug
> > > (http://issues.apache.org/jira/browse/LUCENE-167) but also it seems
> > > that
> > > it's solved in the Lucene integration on site http://
> > > www.lucenebook.com.
> >
> > Where do you see the problem solved at lucenebook.com?   The "Query
> > parsed to:" for this query <http://www.lucenebook.com/search?
> > query=aaaaa+AND+bbbbb+OR+ccccc+AND+ddddd> is
> >
> >         Query parsed to: +aaaaa bbbbb +ccccc +ddddd
> >
> > The OR caused the "bbbbb" term to not be required, yet all other
> > terms are required.
> >
> >         Erik
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: AW: QueryParser explicit and implicit search operator

Erik Hatcher
In reply to this post by Karimi-Tabatabaie Jamal

On Feb 19, 2007, at 10:32 AM, Karimi-Tabatabaie Jamal wrote:
> That means that  last query "Lucene OR Query" must be parsed  
> correct to "Lucene query" with the OR operator while default  
> operator is set to AND. Right?
>
> But in my implementation with default operator set to AND the query  
> 'Hare OR Tortoise' is parsed to 'Hare AND Tortoise'!

Are you perhaps lowercasing the "OR" somewhere along the way (and  
using a stop word removing analyzer)?

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

AW: QueryParser explicit and implicit search operator

Karimi-Tabatabaie Jamal
In reply to this post by Erick Erickson
Hello Erik,

You gave me the clue! I had a toLowerCase on the query.
I was in the hard believe that I did not program such things, as I knew the side effect.

Thanks alot for your boths help and sorry for the trouble. Mea culpa!

Regards,
Jamal

-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:[hidden email]]
Gesendet: Montag, 19. Februar 2007 17:17
An: [hidden email]
Betreff: Re: QueryParser explicit and implicit search operator

You still haven't provided the data. Here's what Mr. Hatcher needs you to
do.

Provide a very short program that demonstrates this. It should parse a query
and print out the parsed query using ToString. For instance, are you
lowercasing the query before parsing? In which case or is not treated as an
operator, but a term.

We can keep going back and forth only to discover that you are not providing
complete information. Or we're assuming something you aren't. or.... the
10,000 ways we can talk past each other without providing crucial
information.

So, please provide a short, self-contained program that demonstrates the
problem and I'm sure Mr. Hatcher will provide an answer in a trice.

I'm going to guess that you won't be able to do this, and in the process
you'll discover that you did something like... instantiated a new
QueryParser and didn't set the default operator as you think. Or lowercased
the query. Or set the default operator on a different QueryParser than you
actually used. Or......

Best
Erick

On 2/19/07, Karimi-Tabatabaie Jamal <[hidden email]> wrote:

>
> Exactly.
>
> That means that  last query "Lucene OR Query" must be parsed correct to
> "Lucene query" with the OR operator while default operator is set to AND.
> Right?
>
> But in my implementation with default operator set to AND the query 'Hare
> OR Tortoise' is parsed to 'Hare AND Tortoise'!
>
> Regards,
> Jamal
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:[hidden email]]
> Gesendet: Montag, 19. Februar 2007 14:45
> An: [hidden email]
> Betreff: Re: QueryParser explicit and implicit search operator
>
> <<<How else would you explain the 316 search results, when the results for
> each single queries "Lucene" and "Query" is lesser than 316?>>>
> Because some documents contain "Lucene" but not "Query" and vice-versa.
>
> These results look perfectly reasonable to me too. The default operator is
> AND, which is why queries 3 and 4 return the same results.
>
> When you query on "Query" you get some number of documents (109 to be
> exact)
> that ALSO contain "Lucene". The reverse is also true, when you query
> "Lucene", 109 of those documents also contain "Query".
>
> So, if you subtract 109 (look familiar?) from the sum of the individual
> queries for Lucene and Query, you get 316 (155 + 270 - 109) which is
> exactly
> what you get from "Lucene OR Query". All fine from my perspective.
>
> Best
> Erick
>
> On 2/19/07, Karimi-Tabatabaie Jamal <[hidden email]> wrote:
> >
> > Hello Erik,
> >
> > You are right for a little complex query. I tested  the following
> queries
> >
> > serach at lucenebool.com with query "Query" 155 results
> > serach at lucenebool.com with query "Lucene" 270 results
> > serach at lucenebool.com with query "Query AND Lucene" 109 results
> > serach at lucenebool.com with query "Query Lucene" 109 results
> > serach at lucenebool.com with query "Query OR Lucene" 316 results
> >
> > and came to the guess that it seems to be fixed for a query like "X OR
> Y"!
> > How else would you explain the 316 search results, when the results for
> each
> > single queries "Lucene" and "Query" is lesser than 316?
> >
> > Jamal
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Erik Hatcher [mailto:[hidden email]]
> > Gesendet: Samstag, 17. Februar 2007 06:27
> > An: [hidden email]
> > Betreff: Re: QueryParser explicit and implicit search operator
> >
> >
> > On Feb 16, 2007, at 1:53 PM, Karimi-Tabatabaie Jamal wrote:
> > > For my problem there seems to exist a lucence Bug
> > > (http://issues.apache.org/jira/browse/LUCENE-167) but also it seems
> > > that
> > > it's solved in the Lucene integration on site http://
> > > www.lucenebook.com.
> >
> > Where do you see the problem solved at lucenebook.com?   The "Query
> > parsed to:" for this query <http://www.lucenebook.com/search?
> > query=aaaaa+AND+bbbbb+OR+ccccc+AND+ddddd> is
> >
> >         Query parsed to: +aaaaa bbbbb +ccccc +ddddd
> >
> > The OR caused the "bbbbb" term to not be required, yet all other
> > terms are required.
> >
> >         Erik
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]