Is there a relevance to text matches?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Is there a relevance to text matches?

RobinBones
The company I work for us currently using a google mini. I would like to
migrate to solr for several features solr has over the mini.

One problem I have run into deals with relevance of text in the search
results.
I wrote a simple app to crawl our website and submit the data to the solr
index.
I can preform a search, but the results are not in an acceptable order.
When I search for Nike, my first 20 items are all non-nike products where
someone has mentioned a nike product in the review.
The term Nike in a product title should have a lot higher relevance, than
the term Nike somewhere in a product review.
I would like to keep indexing the reviews, but I need to specify the title
at a much higher rate. Is that possible?
I can split the values into their own fields if that helps.

Thanks
Robin
Reply | Threaded
Open this post in threaded view
|

Re: Is there a relevance to text matches?

Yonik Seeley-2
On 12/1/06, Robin Bonin <[hidden email]> wrote:
> The term Nike in a product title should have a lot higher relevance, than
> the term Nike somewhere in a product review.
> I would like to keep indexing the reviews, but I need to specify the title
> at a much higher rate. Is that possible?
> I can split the values into their own fields if that helps.

That would absolutely help.... then you could use the lucene
QueryParser syntax for boosting:

A query like the following will count matches in the title roughly 10
times as important:

title:Nike^10 review_body:Nike

The dismax handler also has a way of dealing with this (see the Wiki),
and that would be something to explore after you are comfortable with
standard Lucene syntax.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Re: Is there a relevance to text matches?

Mike Klaas
On 12/1/06, Yonik Seeley <[hidden email]> wrote:

> On 12/1/06, Robin Bonin <[hidden email]> wrote:
> > The term Nike in a product title should have a lot higher relevance, than
> > the term Nike somewhere in a product review.
> > I would like to keep indexing the reviews, but I need to specify the title
> > at a much higher rate. Is that possible?
> > I can split the values into their own fields if that helps.
>
> That would absolutely help.... then you could use the lucene
> QueryParser syntax for boosting:
>
> A query like the following will count matches in the title roughly 10
> times as important:
>
> title:Nike^10 review_body:Nike
>
> The dismax handler also has a way of dealing with this (see the Wiki),
> and that would be something to explore after you are comfortable with
> standard Lucene syntax.

Note that you might want to try spliiting the text into fields and
doing no further boosting--usually the idf and length normalization
factor will give very high scores to title fields without explicit
boosting.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Re: Is there a relevance to text matches?

Chris Hostetter-3

: Note that you might want to try spliiting the text into fields and
: doing no further boosting--usually the idf and length normalization
: factor will give very high scores to title fields without explicit
: boosting.

I've acctually seen the oposite happen as well ... assuming 100,000
products, 1,000 of which are "Nike" products and have "Nike" in the title,
while only 10 products in the catalog have "Nike" in the review text --
half Nike products half competitors ... the idf of review:Nike can boost
the score of competitor products well above the title:Nike products.


-Hoss