boosts?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

boosts?

TomSolrList
Hi -

I'm having a problem getting boosts to work the way I think they are
supposed to.

What I want is for documents to be returned in doc boost order, when
all the queries are constant scoring range queries. (e.g. date:[2006 TO 2007])

I believe (but am not certain) that this is supposed to be what
happens. If that's not the case, you can probably skip the rest :-)

As an example, I grabbed solr-1.1, and ran it (java -jar start.jar).

Then I modified the hd.xml example doc, to add a boost on the first
document (SP2514N)

<doc boost="100.0">

Then I loaded monitor.xml, and hd.xml

./post.sh monitor.xml
./post.sh hd.xml

I then went to the solr admin interface and queried on

id:[* TO *]

Which I believe gets mapped to a ConstantScoreRangeQuery.

So, given

http://fred:8983/solr/select/?q=id%3A%5B*+TO+*%5D&version=2.2&start=0&rows=10&indent=on&debugQuery=1

I get the result below. Note that all the results list "boost=1.0"

I would expect to see a boost of 100 on the SP2514N, in the
explanation. Should I get that? I would also expect it to be at the
head of the list, but I think I'm seeing the docs in insertion order.
(if I insert xd.xml before monitor.xml, I get them in insertion order
in that case as well.)

Please let me know if my assumptions or my methods aren't correct.

Thanks,

Tom


<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">4</int>
  <lst name="params">
   <str name="rows">10</str>
   <str name="start">0</str>


   <str name="indent">on</str>
   <str name="q">id:[* TO *]</str>
   <str name="debugQuery">1</str>
   <str name="version">2.2</str>
  </lst>
</lst>
<result name="response" numFound="3" start="0">
  <doc>


   <arr name="cat"><str>electronics</str><str>monitor</str></arr>
   <arr name="features"><str>30" TFT active matrix LCD, 2560 x 1600,
.25mm dot pitch, 700:1 contrast</str></arr>
   <str name="id">3007WFP</str>
   <bool name="inStock">true</bool>
   <str name="includes">USB cable</str>
   <str name="manu">Dell, Inc.</str>


   <str name="name">Dell Widescreen UltraSharp 3007WFP</str>
   <int name="popularity">6</int>
   <float name="price">2199.0</float>
   <str name="sku">3007WFP</str>
   <float name="weight">401.6</float>
  </doc>


  <doc>
   <arr name="cat"><str>electronics</str><str>hard drive</str></arr>
   <arr name="features"><str>7200RPM, 8MB cache, IDE Ultra
ATA-133</str><str>NoiseGuard, SilentSeek technology, Fluid Dynamic
Bearing (FDB) motor</str></arr>
   <str name="id">SP2514N</str>
   <bool name="inStock">true</bool>
   <str name="manu">Samsung Electronics Co. Ltd.</str>


   <str name="name">Samsung SpinPoint P120 SP2514N - hard drive - 250
GB - ATA-133</str>
   <int name="popularity">6</int>
   <float name="price">92.0</float>
   <str name="sku">SP2514N</str>
  </doc>
  <doc>
   <arr name="cat"><str>electronics</str><str>hard drive</str></arr>


   <arr name="features"><str>SATA 3.0Gb/s, NCQ</str><str>8.5ms
seek</str><str>16MB cache</str></arr>
   <str name="id">6H500F0</str>
   <bool name="inStock">true</bool>
   <str name="manu">Maxtor Corp.</str>
   <str name="name">Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300</str>


   <int name="popularity">6</int>
   <float name="price">350.0</float>
   <str name="sku">6H500F0</str>
  </doc>
</result>
<lst name="debug">
  <str name="rawquerystring">id:[* TO *]</str>
  <str name="querystring">id:[* TO *]</str>


  <str name="parsedquery">id:[* TO *]</str>
  <str name="parsedquery_toString">id:[* TO *]</str>
  <lst name="explain">
   <str name="id=3007WFP,internal_docid=8">
1.0 = (MATCH) ConstantScoreQuery(id:[-}), product of:
   1.0 = boost
   1.0 = queryNorm
</str>
   <str name="id=SP2514N,internal_docid=9">
1.0 = (MATCH) ConstantScoreQuery(id:[-}), product of:
   1.0 = boost
   1.0 = queryNorm
</str>
   <str name="id=6H500F0,internal_docid=10">


1.0 = (MATCH) ConstantScoreQuery(id:[-}), product of:
   1.0 = boost
   1.0 = queryNorm
</str>
  </lst>
</lst>
</response>



Reply | Threaded
Open this post in threaded view
|

Re: boosts?

Yonik Seeley-2
On 12/27/06, Tom <[hidden email]> wrote:
> I'm having a problem getting boosts to work the way I think they are
> supposed to.

Do you have a specific relevance problem you are trying to solve, or
just testing things out?

> What I want is for documents to be returned in doc boost order, when
> all the queries are constant scoring range queries. (e.g. date:[2006 TO 2007])

They are *constant scoring* range queries :-)  Index-time boosts
currently don't factor in.

[...]
> I would expect to see a boost of 100 on the SP2514N, in the
> explanation. Should I get that?

Even if index-time boosts were factored in, you wouldn't see an
explicit 'boost'.
The index-time boost is multiplied by the length normalization factor
for the field and the product is the "norm".

> I would also expect it to be at the
> head of the list, but I think I'm seeing the docs in insertion order.
> (if I insert xd.xml before monitor.xml, I get them in insertion order
> in that case as well.)

I'd recommend only using index-time boosting when you can't get the
relevance you want with query boosting and scoring.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: boosts?

TomSolrList
Hi Yonik,

Thanks for the quick response.

At 07:45 AM 12/28/2006, you wrote:
>On 12/27/06, Tom <[hidden email]> wrote:
>>I'm having a problem getting boosts to work the way I think they are
>>supposed to.
>
>Do you have a specific relevance problem you are trying to solve, or
>just testing things out?

Specific problem.

Frequently our users will start by specifying a facet, such a date
range, geo location, etc. At this point I don't have any positive
query terms, just constant score range queries that are used to
eliminate things the user is not interested in.  So at this point,
there's nothing to be relevant to, so I need to pick some ordering.
Since I have information about which results tend to be more
interesting in the general case, I've set boosts on the documents.
I'd like to order by that, until the user gives me more information.

For an example, think of amazon ordering by "best selling", when the
user asks for books published since Dec. 1st. You don't yet know what
is relevant to this user's query, since all you have is "since Dec
1st", but you want to give an order more reasonable than "doc
number", or "date published".

>>What I want is for documents to be returned in doc boost order, when
>>all the queries are constant scoring range queries. (e.g.
>>date:[2006 TO 2007])
>
>They are *constant scoring* range queries :-)  Index-time boosts
>currently don't factor in.

Gotcha. I think I misinterpreted an earlier post (which did say
"query boost"). I was thinking it would include index time boost, too.


>I'd recommend only using index-time boosting when you can't get the
>relevance you want with query boosting and scoring.

I'm not sure how I'd do it that way.

What I want (what I _think_ I want :-) is a way to specify a default
order for results, for the cases where the user has only provided
exclusion information. In this case, I'm doing a match all docs, with
filter queries.

Tom

Reply | Threaded
Open this post in threaded view
|

Re: boosts?

Mike Klaas
On 12/28/06, Tom <[hidden email]> wrote:

> >I'd recommend only using index-time boosting when you can't get the
> >relevance you want with query boosting and scoring.
>
> I'm not sure how I'd do it that way.
>
> What I want (what I _think_ I want :-) is a way to specify a default
> order for results, for the cases where the user has only provided
> exclusion information. In this case, I'm doing a match all docs, with
> filter queries.

Could you index  your documents in the desired order?  This is the
default sort order.

If not, you can add a field that is present in all documents, and add
this as part of the query.  Then you can fiddle with the index-time
field boost to alter the results (without skewing queries that have a
meaningful relevancy score as using document boosts would do).

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: boosts?

TomSolrList
At 12:03 PM 12/28/2006, you wrote:
>On 12/28/06, Tom <[hidden email]> wrote:
>Could you index  your documents in the desired order?  This is the
>default sort order.

I don't think I can control document order, as documents may get
edited after creation.

>If not, you can add a field that is present in all documents, and add
>this as part of the query.  Then you can fiddle with the index-time
>field boost to alter the results (without skewing queries that have a
>meaningful relevancy score as using document boosts would do).

That seems to work. Thanks!

I'll probably do it that way, but... :-)

I was looking at how I would write a modified version of
MatchAllDocsQuery that would simply return the documents boost as the
score. But I haven't really figured out Lucene scoring.

Could someone explain how one would do something like this? I'm just
trying to understand how one might do custom scoring in Lucene, so
I'm more looking for concepts than code.

Thanks!

Tom

Reply | Threaded
Open this post in threaded view
|

Re: boosts?

Chris Hostetter-3
: >If not, you can add a field that is present in all documents, and add
: >this as part of the query.  Then you can fiddle with the index-time
: >field boost to alter the results (without skewing queries that have a
: >meaningful relevancy score as using document boosts would do).
:
: That seems to work. Thanks!

maybe i'm missing something, but it sounds like what you want is a simple
sort on a numeric field -- whatever value you are tyring to use as the
index time boost, you can just set as a field value instead and then sort
on it right?

: I was looking at how I would write a modified version of
: MatchAllDocsQuery that would simply return the documents boost as the
: score. But I haven't really figured out Lucene scoring.

document boosts aren't maintained in the index ... they are multiplied by
the various field boosts and lengthNorms and stored on a per field basis.




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: boosts?

TomSolrList
At 06:03 PM 12/28/2006, you wrote:
>maybe i'm missing something, but it sounds like what you want is a simple
>sort on a numeric field -- whatever value you are tyring to use as the
>index time boost, you can just set as a field value instead and then sort
>on it right?

Yes.

I had been just been thinking about it in terms of how to use the
info I already had in the index. But making another field works, too,
and is probably simpler.


>: I was looking at how I would write a modified version of
>: MatchAllDocsQuery that would simply return the documents boost as the
>: score. But I haven't really figured out Lucene scoring.
>
>document boosts aren't maintained in the index ... they are multiplied by
>the various field boosts and lengthNorms and stored on a per field basis.

Thanks! I had seen comments that the doc boost wasn't stored, but
didn't know how it worked.

Tom