Why is multiplicative boost prefered over additive?

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Why is multiplicative boost prefered over additive?

Hullegård, Jimi
Hi,

After reading a bit on various sites, and especially the blog post "Comparing boost methods in Solr", it seems that the preferred boosting type is the multiplicative one, over the additive one. But I can't really get my head around *why* that is so, since in most boosting problems I can think of, it seems that an additive boost would suit better.

For example, in our project we want to boost documents depending on various factors, but in essence they can be summarized as:

- Regular edismax logic, like qf=title^2 mainText^1
- Multiple custom document fields, with weights specified at query time

So, first of, the custom fields... It became obvious to me quite quickly that multiplicative logic here would totally ruin the purpose of the weights, since something like "(f1 *  w1) * (f2 * w2)" is the same as "(f1 *  w2) * (f2 * w1)". So, I ended up using additive boost here.

Then we have the combination of the edismax boost, and my custom boost. As far as I understand it, when using the boost field with edismax, this combination is always performed using multiplicative logic. But the same problem exists here as it did with my custom fields. Because if I boost the aggregated result of the custom fields using some weight, it doesn't affect the order of the documents because that weight influences the edismax boost just as much. What I want is to have the weight only influence my custom boost value, so that I can control how much (or little) the final score should be effected by the custom boost.

So, in both cases I find myself wanting to use the additive boost. But surely I must be missing something, right? Am I thinking backwards or something?

I don't use any out-of-the-box example indexes, so I can provide you with a working URL that shows exactly what I am doing. But in essence my query looks like this:

- q=test
- defType=edismax
- qf=title^2&qf=mainText1^1
- totalRanking=div(sum(product(random1,1),product(random2,1.5),product(random3,2),product(random4,2.5),product(random5,3)),5)
- weightedTotalRanking=product($totalRanking,1.5)
- bf=$weightedTotalRanking
- fl=*,score,[explain style=text],$weightedTotalRanking

random1 to random5 are document fields of type double, with random values between 0.0 and 1.0.

With this setup, I can change the overall importance of my custom boosting using the factor in weightedTotalRanking (1.5 above). But that is only because bf is additive. If I switch to the boost parameter, I can no longer influence the order of the documents using this factor, no matter how high a value I choose.

Am I looking at the this the wrong way? Is there a much better approach to achieve what I want?

Regards
/Jimi
Reply | Threaded
Open this post in threaded view
|

Re: Why is multiplicative boost prefered over additive?

Walter Underwood
Think about using popularity as a boost. If one movie has a million rentals and one has a hundred rentals, there is no additive formula that balances that with text relevance. Even with log(popularity), it doesn’t work.

With multiplicative boost, we only care about the difference between the one rented one million time and the one rented 800 thousand times (think about the Twilight movies at Netflix). But it also distinguishes between the one rented 100 times and the one rented 80 times.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)


> On Mar 17, 2016, at 11:29 AM, [hidden email] wrote:
>
> Hi,
>
> After reading a bit on various sites, and especially the blog post "Comparing boost methods in Solr", it seems that the preferred boosting type is the multiplicative one, over the additive one. But I can't really get my head around *why* that is so, since in most boosting problems I can think of, it seems that an additive boost would suit better.
>
> For example, in our project we want to boost documents depending on various factors, but in essence they can be summarized as:
>
> - Regular edismax logic, like qf=title^2 mainText^1
> - Multiple custom document fields, with weights specified at query time
>
> So, first of, the custom fields... It became obvious to me quite quickly that multiplicative logic here would totally ruin the purpose of the weights, since something like "(f1 *  w1) * (f2 * w2)" is the same as "(f1 *  w2) * (f2 * w1)". So, I ended up using additive boost here.
>
> Then we have the combination of the edismax boost, and my custom boost. As far as I understand it, when using the boost field with edismax, this combination is always performed using multiplicative logic. But the same problem exists here as it did with my custom fields. Because if I boost the aggregated result of the custom fields using some weight, it doesn't affect the order of the documents because that weight influences the edismax boost just as much. What I want is to have the weight only influence my custom boost value, so that I can control how much (or little) the final score should be effected by the custom boost.
>
> So, in both cases I find myself wanting to use the additive boost. But surely I must be missing something, right? Am I thinking backwards or something?
>
> I don't use any out-of-the-box example indexes, so I can provide you with a working URL that shows exactly what I am doing. But in essence my query looks like this:
>
> - q=test
> - defType=edismax
> - qf=title^2&qf=mainText1^1
> - totalRanking=div(sum(product(random1,1),product(random2,1.5),product(random3,2),product(random4,2.5),product(random5,3)),5)
> - weightedTotalRanking=product($totalRanking,1.5)
> - bf=$weightedTotalRanking
> - fl=*,score,[explain style=text],$weightedTotalRanking
>
> random1 to random5 are document fields of type double, with random values between 0.0 and 1.0.
>
> With this setup, I can change the overall importance of my custom boosting using the factor in weightedTotalRanking (1.5 above). But that is only because bf is additive. If I switch to the boost parameter, I can no longer influence the order of the documents using this factor, no matter how high a value I choose.
>
> Am I looking at the this the wrong way? Is there a much better approach to achieve what I want?
>
> Regards
> /Jimi

Reply | Threaded
Open this post in threaded view
|

Re: Why is multiplicative boost prefered over additive?

Malcolm Upayavira Holmes
Yes. Boosting adjusts an existing score. That original score can vary,
e.g. depending upon how many search terms there are.

If you use additive boosting, when you add a boost to a search with one
term, (e.g. between 0 and 1) you get a different effect compared to when
you add the same boost to a search with four terms (e.g. between 0 and
4). If, however, you used multiplicative boosting, the impact of the
boosts would be the same.

If, for example, you want to add a recency boost, say with recip, where
the boost value is between 0 and 1, then use score*(1+boost). This way,
a boost of 0 has no effect on the score, whereas a boost of 1 doubles
the score. If you use plain multiplicative here, a boost of 0 wipes out
the score entirely, which can have nasty effects (it has, at least, for
me).

Upayavira

On Thu, 17 Mar 2016, at 06:58 PM, Walter Underwood wrote:

> Think about using popularity as a boost. If one movie has a million
> rentals and one has a hundred rentals, there is no additive formula that
> balances that with text relevance. Even with log(popularity), it doesn’t
> work.
>
> With multiplicative boost, we only care about the difference between the
> one rented one million time and the one rented 800 thousand times (think
> about the Twilight movies at Netflix). But it also distinguishes between
> the one rented 100 times and the one rented 80 times.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Mar 17, 2016, at 11:29 AM, [hidden email] wrote:
> >
> > Hi,
> >
> > After reading a bit on various sites, and especially the blog post "Comparing boost methods in Solr", it seems that the preferred boosting type is the multiplicative one, over the additive one. But I can't really get my head around *why* that is so, since in most boosting problems I can think of, it seems that an additive boost would suit better.
> >
> > For example, in our project we want to boost documents depending on various factors, but in essence they can be summarized as:
> >
> > - Regular edismax logic, like qf=title^2 mainText^1
> > - Multiple custom document fields, with weights specified at query time
> >
> > So, first of, the custom fields... It became obvious to me quite quickly that multiplicative logic here would totally ruin the purpose of the weights, since something like "(f1 *  w1) * (f2 * w2)" is the same as "(f1 *  w2) * (f2 * w1)". So, I ended up using additive boost here.
> >
> > Then we have the combination of the edismax boost, and my custom boost. As far as I understand it, when using the boost field with edismax, this combination is always performed using multiplicative logic. But the same problem exists here as it did with my custom fields. Because if I boost the aggregated result of the custom fields using some weight, it doesn't affect the order of the documents because that weight influences the edismax boost just as much. What I want is to have the weight only influence my custom boost value, so that I can control how much (or little) the final score should be effected by the custom boost.
> >
> > So, in both cases I find myself wanting to use the additive boost. But surely I must be missing something, right? Am I thinking backwards or something?
> >
> > I don't use any out-of-the-box example indexes, so I can provide you with a working URL that shows exactly what I am doing. But in essence my query looks like this:
> >
> > - q=test
> > - defType=edismax
> > - qf=title^2&qf=mainText1^1
> > - totalRanking=div(sum(product(random1,1),product(random2,1.5),product(random3,2),product(random4,2.5),product(random5,3)),5)
> > - weightedTotalRanking=product($totalRanking,1.5)
> > - bf=$weightedTotalRanking
> > - fl=*,score,[explain style=text],$weightedTotalRanking
> >
> > random1 to random5 are document fields of type double, with random values between 0.0 and 1.0.
> >
> > With this setup, I can change the overall importance of my custom boosting using the factor in weightedTotalRanking (1.5 above). But that is only because bf is additive. If I switch to the boost parameter, I can no longer influence the order of the documents using this factor, no matter how high a value I choose.
> >
> > Am I looking at the this the wrong way? Is there a much better approach to achieve what I want?
> >
> > Regards
> > /Jimi
>
Reply | Threaded
Open this post in threaded view
|

Re: Why is multiplicative boost prefered over additive?

Jan Høydahl / Cominvent
You can also use functions to “compress” the source number, so that the
effect of a certain boost becomes bigger or smaller compared to the other
boost you have.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 17. mar. 2016 kl. 23.21 skrev Upayavira <[hidden email]>:
>
> Yes. Boosting adjusts an existing score. That original score can vary,
> e.g. depending upon how many search terms there are.
>
> If you use additive boosting, when you add a boost to a search with one
> term, (e.g. between 0 and 1) you get a different effect compared to when
> you add the same boost to a search with four terms (e.g. between 0 and
> 4). If, however, you used multiplicative boosting, the impact of the
> boosts would be the same.
>
> If, for example, you want to add a recency boost, say with recip, where
> the boost value is between 0 and 1, then use score*(1+boost). This way,
> a boost of 0 has no effect on the score, whereas a boost of 1 doubles
> the score. If you use plain multiplicative here, a boost of 0 wipes out
> the score entirely, which can have nasty effects (it has, at least, for
> me).
>
> Upayavira
>
> On Thu, 17 Mar 2016, at 06:58 PM, Walter Underwood wrote:
>> Think about using popularity as a boost. If one movie has a million
>> rentals and one has a hundred rentals, there is no additive formula that
>> balances that with text relevance. Even with log(popularity), it doesn’t
>> work.
>>
>> With multiplicative boost, we only care about the difference between the
>> one rented one million time and the one rented 800 thousand times (think
>> about the Twilight movies at Netflix). But it also distinguishes between
>> the one rented 100 times and the one rented 80 times.
>>
>> wunder
>> Walter Underwood
>> [hidden email]
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>>> On Mar 17, 2016, at 11:29 AM, [hidden email] wrote:
>>>
>>> Hi,
>>>
>>> After reading a bit on various sites, and especially the blog post "Comparing boost methods in Solr", it seems that the preferred boosting type is the multiplicative one, over the additive one. But I can't really get my head around *why* that is so, since in most boosting problems I can think of, it seems that an additive boost would suit better.
>>>
>>> For example, in our project we want to boost documents depending on various factors, but in essence they can be summarized as:
>>>
>>> - Regular edismax logic, like qf=title^2 mainText^1
>>> - Multiple custom document fields, with weights specified at query time
>>>
>>> So, first of, the custom fields... It became obvious to me quite quickly that multiplicative logic here would totally ruin the purpose of the weights, since something like "(f1 *  w1) * (f2 * w2)" is the same as "(f1 *  w2) * (f2 * w1)". So, I ended up using additive boost here.
>>>
>>> Then we have the combination of the edismax boost, and my custom boost. As far as I understand it, when using the boost field with edismax, this combination is always performed using multiplicative logic. But the same problem exists here as it did with my custom fields. Because if I boost the aggregated result of the custom fields using some weight, it doesn't affect the order of the documents because that weight influences the edismax boost just as much. What I want is to have the weight only influence my custom boost value, so that I can control how much (or little) the final score should be effected by the custom boost.
>>>
>>> So, in both cases I find myself wanting to use the additive boost. But surely I must be missing something, right? Am I thinking backwards or something?
>>>
>>> I don't use any out-of-the-box example indexes, so I can provide you with a working URL that shows exactly what I am doing. But in essence my query looks like this:
>>>
>>> - q=test
>>> - defType=edismax
>>> - qf=title^2&qf=mainText1^1
>>> - totalRanking=div(sum(product(random1,1),product(random2,1.5),product(random3,2),product(random4,2.5),product(random5,3)),5)
>>> - weightedTotalRanking=product($totalRanking,1.5)
>>> - bf=$weightedTotalRanking
>>> - fl=*,score,[explain style=text],$weightedTotalRanking
>>>
>>> random1 to random5 are document fields of type double, with random values between 0.0 and 1.0.
>>>
>>> With this setup, I can change the overall importance of my custom boosting using the factor in weightedTotalRanking (1.5 above). But that is only because bf is additive. If I switch to the boost parameter, I can no longer influence the order of the documents using this factor, no matter how high a value I choose.
>>>
>>> Am I looking at the this the wrong way? Is there a much better approach to achieve what I want?
>>>
>>> Regards
>>> /Jimi
>>

Reply | Threaded
Open this post in threaded view
|

RE: Why is multiplicative boost prefered over additive?

Hullegård, Jimi
In reply to this post by Walter Underwood
On Thursday, March 17, 2016 7:58 PM, [hidden email] wrote:
>
> Think about using popularity as a boost. If one movie has a million rentals and one has a hundred rentals, there is no additive formula that balances that with text relevance. Even with log(popularity), it doesn't work.

I'm not sure I follow your logic now. If one can express the popularity as a value between 0.0 and 1.0, why can't one use that, together with a weight (indicating how much the popularity should influence the score, in general) and add that to the text relevance score? And how, exactly, would I achieve that using any multiplicative formula?

The logic of the weight, in this case, is that I want to be able to tweak how much influence the popularity has on the final score (and thus the sort order of the documents), where a weight of 0.0 would have the same effect as if the popularity wasn't included in the boost logic at all, and a high enough weight would have the same effect as if one sorted the documents solely on popularity.

/Jimi
Reply | Threaded
Open this post in threaded view
|

Re: Why is multiplicative boost prefered over additive?

Shawn Heisey-2
On 3/18/2016 6:34 AM, [hidden email] wrote:
> I'm not sure I follow your logic now. If one can express the popularity as a value between 0.0 and 1.0, why can't one use that, together with a weight (indicating how much the popularity should influence the score, in general) and add that to the text relevance score? And how, exactly, would I achieve that using any multiplicative formula?
>
> The logic of the weight, in this case, is that I want to be able to tweak how much influence the popularity has on the final score (and thus the sort order of the documents), where a weight of 0.0 would have the same effect as if the popularity wasn't included in the boost logic at all, and a high enough weight would have the same effect as if one sorted the documents solely on popularity.

Restating Walter's point in a different way:

The "max score" of a particular query can vary widely, and only has
meaning within the context of that query.  One query on an index might
produce a max score of 0.944, so *every* document has a score less than
one, while another query *on the same index* (that might even have some
of the same result documents) might produce a max score of 12.7, so the
top docs have a score *much* higher than one.

If your additive boost is 5, this represents a relative boost of over
500 percent for the top docs of the first query I talked about above,
but less than 50% for the top docs of the second.

If you have a multiplicative boost of 1.5, then the relative boost for
both queries is 150 percent.

To use boosting successfully, you must have control of the *relative*
effect you are producing.  Multiplicative boosts *keep* things relative,
additive boosting makes assumptions about the max score, and those
assumptions may turn out to be completely wrong.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

RE: Why is multiplicative boost prefered over additive?

Hullegård, Jimi
In reply to this post by Malcolm Upayavira Holmes
On Thursday, March 17, 2016 11:21 PM, [hidden email] wrote:
>
> If you use additive boosting, when you add a boost to a search with one term, (e.g. between 0 and 1)
> you get a different effect compared to when you add the same boost to a search with four terms (e.g. between 0 and 4).

Wouldn't that be solvable by multiplying my boost with the max value? Ie in the search with one term, my boost is multiplied by 1, and in the case with four terms it is multiplied by four. Ie some kind of normalization should solve this, right?


> If, for example, you want to add a recency boost, say with recip, where the boost value is between 0 and 1,
> then use score*(1+boost). This way, a boost of 0 has no effect on the score, whereas a boost of 1 doubles the score.
> If you use plain multiplicative here, a boost of 0 wipes out the score entirely, which can have nasty effects (it has, at least, for me).

I understand what you mean, but can you still call that a multiplicative function? Because score*(1+boost) is the same as score + score*boost. Ie, you basically take your boost, multiply it by the original score, and then *add* the original score.

But sure, if this technically is still called a multiplicative (which I guess it is, in a way, since you can achieve this using the boost function in edismax, which is declared as multiplicative).

/Jimi
Reply | Threaded
Open this post in threaded view
|

Re: Why is multiplicative boost prefered over additive?

Walter Underwood
In reply to this post by Hullegård, Jimi
Popularity has a very wide range. Try my example, scale 1 million and 100 into the same 1.0-0.0 range. Even with log popularity.

As another poster pointed out, text relevance scores also have a wide range.

In practice, I never could get additive boost to work right at Netflix at both ends of the popularity scale. I gave up and made it work for popular movies. Here at Chegg, multiplicative boost works fine.

Don’t think so much about the absolute values of the scores. All we care about is ordering. Work with real user queries, not with theory.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)


> On Mar 18, 2016, at 5:34 AM, <[hidden email]> <[hidden email]> wrote:
>
> On Thursday, March 17, 2016 7:58 PM, [hidden email] wrote:
>>
>> Think about using popularity as a boost. If one movie has a million rentals and one has a hundred rentals, there is no additive formula that balances that with text relevance. Even with log(popularity), it doesn't work.
>
> I'm not sure I follow your logic now. If one can express the popularity as a value between 0.0 and 1.0, why can't one use that, together with a weight (indicating how much the popularity should influence the score, in general) and add that to the text relevance score? And how, exactly, would I achieve that using any multiplicative formula?
>
> The logic of the weight, in this case, is that I want to be able to tweak how much influence the popularity has on the final score (and thus the sort order of the documents), where a weight of 0.0 would have the same effect as if the popularity wasn't included in the boost logic at all, and a high enough weight would have the same effect as if one sorted the documents solely on popularity.
>
> /Jimi

Reply | Threaded
Open this post in threaded view
|

RE: Why is multiplicative boost prefered over additive?

Hullegård, Jimi
In reply to this post by Shawn Heisey-2
On Friday, March 18, 2016 2:19 PM, [hidden email] wrote:
>
> The "max score" of a particular query can vary widely, and only has meaning within the context of that query.  
> One query on an index might produce a max score of 0.944, so *every* document has a score less than one,
> while another query *on the same index* (that might even have some of the same result documents)
> might produce a max score of 12.7, so the top docs have a score *much* higher than one.
>
> If your additive boost is 5, this represents a relative boost of over 500 percent for the top docs
> of the first query I talked about above, but less than 50% for the top docs of the second.

Thanks Shawn. I think I understand. I guess I was stuck in the mindset of having all original scores within a defined interval.

Although I still don't fully understand why solr can't normalize the score, so it is always between say 0.0 and 100.0. Because surely solr knows what the maximum "raw score" is.

Sure, I have read the page "Scores As Percentages", but the main argument there against a normalized score seems to be that it still doesn't make different queries truly "comparable", but that's not what I'm after anyway. I would only use the normalized score in my own boost calculation, nothing else.

But, anyway... Since the score(1+boost...) suggestion from Upayavira solves the problem with weights, I guess I will start using multiplicative boosts now. :)

But it would be nice to see how other people handle weighted boosts. And, in general I find it a bit hard to find concrete examples of queries where one combines multiple boost factors (like date recency, popularity, document type etc). Most documentation seem to focus on *one* factor only. Like "this is how you sort/score based on popularity", "this is how you get more recent documents first" etc...

/Jimi
Reply | Threaded
Open this post in threaded view
|

RE: Why is multiplicative boost prefered over additive?

Hullegård, Jimi
In reply to this post by Walter Underwood
On Friday, March 18, 2016 3:53 PM, [hidden email] wrote:
>
> Popularity has a very wide range. Try my example, scale 1 million and 100 into the same 1.0-0.0 range. Even with log popularity.

Well, in our case, we don't really care do differentiate between documents with low popularity. And if we know roughly what the popularity distribution is it is not hard to normalize it to a value between 0.0 and 1.0. The most simple approach is to simply focus on the maximum value, and mapping that value to 1.0, so basically the normalization function is: normalizedValue=value/maxValue. But knowing the mean and median, or other statistical information, one could of course use a more advanced calculation.

In essence, if one can answer the question "How popular is this document/movie/item?", using "extremely popular", "very popular", "quite popular", "average", "not very popular" and "very unpopular" (ie popularity normalized down to 6 possible values), it should not be that hard to normalize the popularity to a value between 0.0 and 1.0.

/Jimi
Reply | Threaded
Open this post in threaded view
|

Re: Why is multiplicative boost prefered over additive?

Walter Underwood
That works fine if you have a query that matches things with a wide range of popularities. But that is the easy case.

What about the query “twilight”, which matches all the Twilight movies, all of which are popular (millions of views). Or “Lord of the Rings” which only matches movies with hundreds of views? People really will notice when the 1978 animated version shows up before the Peter Jackson films.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)


> On Mar 18, 2016, at 8:18 AM, <[hidden email]> <[hidden email]> wrote:
>
> On Friday, March 18, 2016 3:53 PM, [hidden email] wrote:
>>
>> Popularity has a very wide range. Try my example, scale 1 million and 100 into the same 1.0-0.0 range. Even with log popularity.
>
> Well, in our case, we don't really care do differentiate between documents with low popularity. And if we know roughly what the popularity distribution is it is not hard to normalize it to a value between 0.0 and 1.0. The most simple approach is to simply focus on the maximum value, and mapping that value to 1.0, so basically the normalization function is: normalizedValue=value/maxValue. But knowing the mean and median, or other statistical information, one could of course use a more advanced calculation.
>
> In essence, if one can answer the question "How popular is this document/movie/item?", using "extremely popular", "very popular", "quite popular", "average", "not very popular" and "very unpopular" (ie popularity normalized down to 6 possible values), it should not be that hard to normalize the popularity to a value between 0.0 and 1.0.
>
> /Jimi

Reply | Threaded
Open this post in threaded view
|

RE: Why is multiplicative boost prefered over additive?

Hullegård, Jimi
On Friday, March 18, 2016 4:25 PM, [hidden email] wrote:
>
> That works fine if you have a query that matches things with a wide range of popularities. But that is the easy case.
>
> What about the query "twilight", which matches all the Twilight movies, all of which are popular (millions of views).

Well, like I said, I focused on our use case. And we deal with articles, not movies. And the raw popularity value is basically just "the number of page views the last N days". We want to boost documents that many people have visited recently, but don't really care about the exact search result position when comparing documents with roughly the same popularity. So if all the matched documents have *roughly* the same popularity, then we basically don't want the popularity to influence the score much at all.

> Or "Lord of the Rings" which only matches movies with hundreds of views? People really will notice when
> the 1978 animated version shows up before the Peter Jackson films.

Well, doesn't the Peter Jackson "Lord of the Rings" films have more than just a few hundred views?

/Jimi
Reply | Threaded
Open this post in threaded view
|

Re: Why is multiplicative boost prefered over additive?

Walter Underwood
I used a popularity score based on the DVD being in people’s queues and the streaming views. The Peter Jackson films were DVD only. They were in about 100 subscriber queues. The first Twilight film was in 1.25 million queues.

Now think about the query “twilight zone”. How do you make “Twilight” not be the first hit for that?

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)


> On Mar 18, 2016, at 8:48 AM, <[hidden email]> <[hidden email]> wrote:
>
> On Friday, March 18, 2016 4:25 PM, [hidden email] wrote:
>>
>> That works fine if you have a query that matches things with a wide range of popularities. But that is the easy case.
>>
>> What about the query "twilight", which matches all the Twilight movies, all of which are popular (millions of views).
>
> Well, like I said, I focused on our use case. And we deal with articles, not movies. And the raw popularity value is basically just "the number of page views the last N days". We want to boost documents that many people have visited recently, but don't really care about the exact search result position when comparing documents with roughly the same popularity. So if all the matched documents have *roughly* the same popularity, then we basically don't want the popularity to influence the score much at all.
>
>> Or "Lord of the Rings" which only matches movies with hundreds of views? People really will notice when
>> the 1978 animated version shows up before the Peter Jackson films.
>
> Well, doesn't the Peter Jackson "Lord of the Rings" films have more than just a few hundred views?
>
> /Jimi

Reply | Threaded
Open this post in threaded view
|

RE: Why is multiplicative boost prefered over additive?

Hullegård, Jimi
On Friday, March 18, 2016 5:11 PM, [hidden email] wrote:
>
> I used a popularity score based on the DVD being in people's queues and the streaming views.
> The Peter Jackson films were DVD only. They were in about 100 subscriber queues.
> The first Twilight film was in 1.25 million queues.
> Now think about the query "twilight zone". How do you make "Twilight" not be the first hit for that?

1. Maybe your popularity value should include more types of "views", maybe both Netflix-internal (like total dvd rental, the last X days back, or since forever), and netflix-external (like total number of movie-tickets sold, worldwide or in the same country).
2. Shouldn't the word "zone" exclude the twilight movies altogether, or at least boost the results with that word in the title?
3. Maybe the popularity has a too much of influence on the score?
4. I never said my reasoning about normalizing popularity was applicable to your use case. On the contrary, like I said before, I focused on our own use case.

/Jimi