Highlighter and complex queries

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Highlighter and complex queries

Marios Skounakis
  Hi all,

Suppose the user enters the following query using a textbox interface:
"rate based optimization" (as a phrase query, including the quotes). The
query is parsed using QueryParser, then it is rewritten, and given to
the highlighter. Then, method getBestTextFragments is called.

The method returns some fragments which contain only one of the words in
the search phrase. Isn't this wrong? Since this is a phrase query,
shouldn't the highlighter look for fragments which contain all three
words, and even more, only for fragments in which the three words are
adjascent (based on the token stream returned by the analyzer)?

Thanks in advance,
Marios

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Highlighter and complex queries

Erik Hatcher

On Apr 29, 2006, at 1:59 AM, Marios Skounakis wrote:

> Suppose the user enters the following query using a textbox  
> interface: "rate based optimization" (as a phrase query, including  
> the quotes). The query is parsed using QueryParser, then it is  
> rewritten, and given to the highlighter. Then, method  
> getBestTextFragments is called.
>
> The method returns some fragments which contain only one of the  
> words in the search phrase. Isn't this wrong? Since this is a  
> phrase query, shouldn't the highlighter look for fragments which  
> contain all three words, and even more, only for fragments in which  
> the three words are adjascent (based on the token stream returned  
> by the analyzer)?

"wrong" is subjective in this case.  I personally prefer exact  
highlighting based on what matched, not just individual term  
extraction.  I have, in one project, converted all queries to a  
SpanQuery and used getSpans() to do highlighting in an accurate way.  
This particular code is not generalizable easily and was written  
under contract, so I cannot share it, but it actually was not very  
complex to do.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Highlighter and complex queries

mark harwood
In reply to this post by Marios Skounakis
Hi Marios.

 >>Isn't this wrong?
Yes but this is an itch that no one has been suffficently been bothered
by to fix yet.
I still haven't had the time or a desperate need to implement this so it
will probably remain that way until someone feels strongly enough about
the problem to fix it. Highlighting is not a straight forward problem if
your goal is to exactly reflect the query logic- especially if you also
try to summarise large texts AND you are dealing with complex queries
containing Spans, "NOT" clauses and nested Boolean logic etc Some
compromises have to be made.

My suggestion as to how this might best be approached and links to some
related code is here:

http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2


This post highlights some of the intricacies involved.

http://www.gossamer-threads.com/lists/lucene/java-dev/23592#23592


Cheers
Mark



Marios Skounakis wrote:

>  Hi all,
>
> Suppose the user enters the following query using a textbox interface:
> "rate based optimization" (as a phrase query, including the quotes).
> The query is parsed using QueryParser, then it is rewritten, and given
> to the highlighter. Then, method getBestTextFragments is called.
>
> The method returns some fragments which contain only one of the words
> in the search phrase. Isn't this wrong? Since this is a phrase query,
> shouldn't the highlighter look for fragments which contain all three
> words, and even more, only for fragments in which the three words are
> adjascent (based on the token stream returned by the analyzer)?
>
> Thanks in advance,
> Marios
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>


Send instant messages to your online friends http://uk.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]