# Mahout/Taste covariance between two items

8 messages
Open this post in threaded view
|

## Mahout/Taste covariance between two items

 hi guys, just wondering if you have a method implemeted which would calculate the covariance between two items. and the variance of an item. I looked itemSimilarities but that one does something different. thanks Tama
Open this post in threaded view
|

## Re: Mahout/Taste covariance between two items

 Yes. Look at PearsonCorrelationSimilarity. It implements ItemSimilarity so it can compute a Pearson correlation between ratings for two items. Pearson is the covariance divided by the product of the standard deviations. So, just multiply the similarity value you get by the standard deviations of the items' preference values. The variance of each item's preference values is simply the square of the standard deviation, if that's what you mean. You can use RunningAverageAndStdDev to help compute standard deviation if you like. On Thu, Nov 26, 2009 at 3:14 PM, jamborta <[hidden email]> wrote: > > hi guys, > just wondering if you have a method implemeted which would calculate the > covariance between two items. and the variance of an item. I looked > itemSimilarities but that one does something different. > > thanks > Tama > -- > View this message in context: http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26530825.html> Sent from the Mahout User List mailing list archive at Nabble.com. > >
Open this post in threaded view
|

## Re: Mahout/Taste covariance between two items

 great. thanks a lot. srowen wrote Yes. Look at PearsonCorrelationSimilarity. It implements ItemSimilarity so it can compute a Pearson correlation between ratings for two items. Pearson is the covariance divided by the product of the standard deviations. So, just multiply the similarity value you get by the standard deviations of the items' preference values. The variance of each item's preference values is simply the square of the standard deviation, if that's what you mean. You can use RunningAverageAndStdDev to help compute standard deviation if you like. On Thu, Nov 26, 2009 at 3:14 PM, jamborta  wrote: > > hi guys, > just wondering if you have a method implemeted which would calculate the > covariance between two items. and the variance of an item. I looked > itemSimilarities but that one does something different. > > thanks > Tama > -- > View this message in context: http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26530825.html> Sent from the Mahout User List mailing list archive at Nabble.com. > >
Open this post in threaded view
|

## Re: Mahout/Taste covariance between two items

 hi. I tried to figure out how you calcualte pearson correlation, but it looks like you use this formula: sumXY / sqrt(sumX2 * sumY2) where sumXY = sumXY - meanY * sumX; sumX2 = sumX2 - meanX * sumX; sumY2 = sumY2 - meanY * sumY; i don't really understand how you got these equations. could you explain it to me? I thought pearson correlation would be like this E(x_i-meanX)(y_i-meanY) / sdX*sdY for my project I would need to get sample correlation coefficient which would be something like this: sum(x_i-meanX)(y_i-meanY)/(N-1) could that just be derived from the values you've already calculated? thanks a lot. srowen wrote Yes. Look at PearsonCorrelationSimilarity. It implements ItemSimilarity so it can compute a Pearson correlation between ratings for two items. Pearson is the covariance divided by the product of the standard deviations. So, just multiply the similarity value you get by the standard deviations of the items' preference values. The variance of each item's preference values is simply the square of the standard deviation, if that's what you mean. You can use RunningAverageAndStdDev to help compute standard deviation if you like. On Thu, Nov 26, 2009 at 3:14 PM, jamborta  wrote: > > hi guys, > just wondering if you have a method implemeted which would calculate the > covariance between two items. and the variance of an item. I looked > itemSimilarities but that one does something different. > > thanks > Tama > -- > View this message in context: http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26530825.html> Sent from the Mahout User List mailing list archive at Nabble.com. > >
Open this post in threaded view
|

## Re: Mahout/Taste covariance between two items

 On Fri, Nov 27, 2009 at 1:41 AM, jamborta <[hidden email]> wrote: > > hi. I tried to figure out how you calcualte pearson correlation, but it looks > like you use this formula: > > sumXY / sqrt(sumX2 * sumY2) Yes that's right -- this is what Pearson reduces to when the mean of X and Y are 0. And they are here -- the implementation 'centers' the data. > where sumXY = sumXY - meanY * sumX; > sumX2 = sumX2 - meanX * sumX; > sumY2 = sumY2 - meanY * sumY; You see the lines commented out there? Those are the full forms of the expressions, which may make more sense. This is centering the data, making the mean 0. This is a simplification based on the observation that, for example, sumX * meanY = sumY * meanX = n * meanY * meanX. > > i don't really understand how you got these equations. could you explain it > to me? I thought pearson correlation would be like this > > E(x_i-meanX)(y_i-meanY) / sdX*sdY That's right that's the expression for a population correlation, but we can really only compute a sample Pearson correlation coefficient, yes: > for my project I would need to get sample correlation coefficient which > would be something like this: > > sum(x_i-meanX)(y_i-meanY)/(N-1) Yeah that's fine too, this is another way of expressing the formula, though you're missing the two standard deviations in the denominator. It'll be clearer if I note that the mean of X and Y are 0.
Open this post in threaded view
|

## Re: Mahout/Taste covariance between two items

 thanks you. much clearer now. so for my purpose this will do: sumXY/N-1 given that the data is 'centered'? which hopefully would be the covariance of X and Y On Fri, Nov 27, 2009 at 1:41 AM, jamborta  wrote: > > hi. I tried to figure out how you calcualte pearson correlation, but it looks > like you use this formula: > > sumXY / sqrt(sumX2 * sumY2) Yes that's right -- this is what Pearson reduces to when the mean of X and Y are 0. And they are here -- the implementation 'centers' the data. > where sumXY = sumXY - meanY * sumX; > sumX2 = sumX2 - meanX * sumX; > sumY2 = sumY2 - meanY * sumY; You see the lines commented out there? Those are the full forms of the expressions, which may make more sense. This is centering the data, making the mean 0. This is a simplification based on the observation that, for example, sumX * meanY = sumY * meanX = n * meanY * meanX. > > i don't really understand how you got these equations. could you explain it > to me? I thought pearson correlation would be like this > > E(x_i-meanX)(y_i-meanY) / sdX*sdY That's right that's the expression for a population correlation, but we can really only compute a sample Pearson correlation coefficient, yes: > for my project I would need to get sample correlation coefficient which > would be something like this: > > sum(x_i-meanX)(y_i-meanY)/(N-1) Yeah that's fine too, this is another way of expressing the formula, though you're missing the two standard deviations in the denominator. It'll be clearer if I note that the mean of X and Y are 0.