Highlighting Quoted Phrases

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Highlighting Quoted Phrases

Chris Harris-2
I'm using the standard Solr query language and the normal highlighting
parameters documented at
http://wiki.apache.org/solr/HighlightingParameters. Snippet generation
and highlighting is working pretty well, but my testers have
discovered something they find borderline unacceptable. If they search
for

    "stock market"

(with quotes), then Solr correctly returns only documents where
"stock" and "market" appear as adjacent words. Two problems though:
First, Solr is willing to pick snippets where only one of the terms
appears, e.g.

    ...and changes in the <b>market</b> regulation environment...

Second, even when Solr picks a snippet that indeed has "stock" and
"market" adjacent to one another, it still highlights any non-adjacent
instances of "stock" and "market", e.g.

    ... huge <b>stock</b> sales due to recent increases in
<b>stock</b> <b>market</b> prices...

(In the latter case the first instance of "stock" should not be highlighted.)

My testers say that both of these behaviors are incorrect, because
when people search for "stock market", they're not that interested in
the parts of the document where "stock" and "market" do not appear
together. I'm inclined to agree. I'm not sure there's an easy fix,
though, is there? The Lucene highlighter code seems to think only in
terms of terms, rather than any higher-level constructs.

Has anyone here dealt with this issue? Maybe I need to try the Lucene list.

Thanks,
Chris
Reply | Threaded
Open this post in threaded view
|

Re: Highlighting Quoted Phrases

Brian Whitman

On Mar 25, 2008, at 6:31 PM, Chris Harris wrote:

> working pretty well, but my testers have
> discovered something they find borderline unacceptable. If they search
> for
>
>    "stock market"
>
> (with quotes), then Solr correctly returns only documents where
> "stock" and "market" appear as adjacent words. Two problems though:
> First, Solr is willing to pick snippets where only one of the terms
> appears, e.g.
>
>    ...and changes in the <b>market</b> regulation environment...


I recently asked about the same thing. There's a patch in lucene (not  
in trunk yet) to support this.

It would take some amount of work to get it in solr, but I haven't  
investigated yet.

-b


Reply | Threaded
Open this post in threaded view
|

Re: Highlighting Quoted Phrases

Vinci
Hi,

Would it be easier if you turn off the highlighting while viewing full document (but summary highlighting is still available) and use javascript to do the matching? (As long as we are need highlighting only when looking at specific document in runtime)

Thank you,
Vinci
Brian Whitman wrote
On Mar 25, 2008, at 6:31 PM, Chris Harris wrote:

> working pretty well, but my testers have
> discovered something they find borderline unacceptable. If they search
> for
>
>    "stock market"
>
> (with quotes), then Solr correctly returns only documents where
> "stock" and "market" appear as adjacent words. Two problems though:
> First, Solr is willing to pick snippets where only one of the terms
> appears, e.g.
>
>    ...and changes in the market regulation environment...


I recently asked about the same thing. There's a patch in lucene (not  
in trunk yet) to support this.

It would take some amount of work to get it in solr, but I haven't  
investigated yet.

-b

Reply | Threaded
Open this post in threaded view
|

Re: Highlighting Quoted Phrases

Chris Harris-2
In reply to this post by Brian Whitman
On Tue, Mar 25, 2008 at 4:25 PM, Brian Whitman <[hidden email]> wrote:

>
>  On Mar 25, 2008, at 6:31 PM, Chris Harris wrote:
>
>  > working pretty well, but my testers have
>  > discovered something they find borderline unacceptable. If they search
>  > for
>  >
>  >    "stock market"
>  >
>  > (with quotes), then Solr correctly returns only documents where
>  > "stock" and "market" appear as adjacent words. Two problems though:
>  > First, Solr is willing to pick snippets where only one of the terms
>  > appears, e.g.
>  >
>  >    ...and changes in the <b>market</b> regulation environment...
>
>
>  I recently asked about the same thing. There's a patch in lucene (not
>  in trunk yet) to support this.

Oh dear, you did ask the same question very recently. Sorry to re-ask
the same thing, everybody.

For the record, that thread is called "highlighting pt2: returning
tokens out of order from PhraseQuery", and it's (currently anyway)
available at:

http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html