date boosting and dismax

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

date boosting and dismax

Shawn Heisey-4
  I've started a couple of previous threads on this topic, but I did not
have a good date field in my index to use at the time.  I now have a
schema with the document's post_date in tdate format, so I would like to
actually do some implementation.  Right now, we are not doing relevancy
ranking at all - we sort by descending post_date.  We have been working
on our application code so we can switch to dismax and use relevancy,
but it's still important to have a small bias towards newer content.

The idea is nothing this list hasn't heard before - to give newer
documents a slight relevancy boost.  An important sub-goal is to ensure
that the adjustment doesn't render Solr's caches useless.  I'm thinking
that this means that at a minimum, I need to round dates to a resolution
of 1 day, but if it's doable, 1 week might be even better.  I do like
the idea of having different boosts for different time ranges.

Can anyone give me a starting point on how to do this?  I will need
actual URL examples and dismax configuration snippets.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

RE: date boosting and dismax

Knaak
I used this before my search term and it works well:

{!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)}

Its enough that when I search for *:* the articles appear in
chronological order.

Tim

-----Original Message-----
From: Shawn Heisey [mailto:[hidden email]]
Sent: Wednesday, July 14, 2010 11:47 AM
To: [hidden email]
Subject: date boosting and dismax


  I've started a couple of previous threads on this topic, but I did not

have a good date field in my index to use at the time.  I now have a
schema with the document's post_date in tdate format, so I would like to

actually do some implementation.  Right now, we are not doing relevancy
ranking at all - we sort by descending post_date.  We have been working
on our application code so we can switch to dismax and use relevancy,
but it's still important to have a small bias towards newer content.

The idea is nothing this list hasn't heard before - to give newer
documents a slight relevancy boost.  An important sub-goal is to ensure
that the adjustment doesn't render Solr's caches useless.  I'm thinking
that this means that at a minimum, I need to round dates to a resolution

of 1 day, but if it's doable, 1 week might be even better.  I do like
the idea of having different boosts for different time ranges.

Can anyone give me a starting point on how to do this?  I will need
actual URL examples and dismax configuration snippets.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: date boosting and dismax

Shawn Heisey-4
One of the replies I got on a previous thread mentioned range queries,
with this example:

[NOW-6MONTHS TO NOW]^5.0 ,
[NOW-1YEARS TO NOW-6MONTHS]^3.0
[NOW-2YEARS TO NOW-1YEARS]^2.0
[* TO NOW-2YEARS]^1.0

Something like this seems more flexible, and into it, I read an
implication that the performance would be better than the boost function
you've shown, but I don't know how to actually put it into a URL or
handler config.

I also seem to remember seeing something about how to do "less than" in
range queries as well as the "less than or equal to" implied by the
above, but I cannot find it now.

Thanks,
Shawn


On 7/14/2010 10:26 AM, Tim Gilbert wrote:
> I used this before my search term and it works well:
>
> {!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)}
>
> Its enough that when I search for *:* the articles appear in
> chronological order.
>
> Tim

Reply | Threaded
Open this post in threaded view
|

RE: date boosting and dismax

Knaak
Re: flexibility.

This boost does decays over time, the further it gets from now the less
of a boost it receives.  You are right though, it doesn't allow a fine
degree of control, particularly if you don't want to smoothly decay the
boost.  I hadn't considered your suggestion, so I'll keep it in mind if
the need arises.

Re:  Adding boost to query:

I am no expert, but I did this and it worked:

SolrJ:  solrQuery.setQuery("{!boost
b=recip(ms(NOW,publishdate),3.16e-11,1,1)} " + queryparam);

Where queryparam is what you are searching for.  You quite literally
just prepend it.


Via http://localhost:8080/apache-solr-1.4.0/select, just prepend it to
your q= like this:
        q={!boost+b%3Drecip(ms(NOW,publishdate),3.16e-11,1,1)}+findthis

Tim

-----Original Message-----
From: Shawn Heisey [mailto:[hidden email]]
Sent: Wednesday, July 14, 2010 1:16 PM
To: [hidden email]
Subject: Re: date boosting and dismax

One of the replies I got on a previous thread mentioned range queries,
with this example:

[NOW-6MONTHS TO NOW]^5.0 ,
[NOW-1YEARS TO NOW-6MONTHS]^3.0
[NOW-2YEARS TO NOW-1YEARS]^2.0
[* TO NOW-2YEARS]^1.0

Something like this seems more flexible, and into it, I read an
implication that the performance would be better than the boost function

you've shown, but I don't know how to actually put it into a URL or
handler config.

I also seem to remember seeing something about how to do "less than" in
range queries as well as the "less than or equal to" implied by the
above, but I cannot find it now.

Thanks,
Shawn


On 7/14/2010 10:26 AM, Tim Gilbert wrote:
> I used this before my search term and it works well:
>
> {!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)}
>
> Its enough that when I search for *:* the articles appear in
> chronological order.
>
> Tim

Reply | Threaded
Open this post in threaded view
|

Re: date boosting and dismax

Jonathan Rochkind
In reply to this post by Shawn Heisey-4
Shawn Heisey wrote:
> [* TO NOW-2YEARS]^1.0
>  

> I also seem to remember seeing something about how to do "less than" in
> range queries as well as the "less than or equal to" implied by the
> above, but I cannot find it now.
>  
Ranges with square brackets [] are inclusive. Ranges with parens () are
exclusive.  And you have a less than example above:

[* TO value]   is a 'less than or equal to value' (inclusive)
(* TO value) is a 'less than not including value' (exclusive)

Now, if you want inclusive on one end but exclusive on the other, I'm
pretty sure you're out of luck. :)

Jonathan