I'm not so familiar with this formula but you seem to be missing

something in the denominator... it's got to normalize somehow. I think

I said divide by standard deviation but that's not quite it. What you

are really summing are the products of z-scores. See

http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficientBut I think you should just use the formulation given in the code?

should be the same result. At least I hope these aren't different

definitions of Pearson!

On Fri, Nov 27, 2009 at 10:20 AM, jamborta <jamborta@gmail.com> wrote:

>

> thanks you. much clearer now.

>

> so for my purpose this will do:

>

> sumXY/N-1

>

> given that the data is 'centered'?

>

>

> On Fri, Nov 27, 2009 at 1:41 AM, jamborta <jamborta@gmail.com> wrote:

>>

>> hi. I tried to figure out how you calcualte pearson correlation, but it

>> looks

>> like you use this formula:

>>

>> sumXY / sqrt(sumX2 * sumY2)

>

> Yes that's right -- this is what Pearson reduces to when the mean of X

> and Y are 0. And they are here -- the implementation 'centers' the

> data.

>

>> where sumXY = sumXY - meanY * sumX;

>> sumX2 = sumX2 - meanX * sumX;

>> sumY2 = sumY2 - meanY * sumY;

>

> You see the lines commented out there? Those are the full forms of the

> expressions, which may make more sense. This is centering the data,

> making the mean 0.

>

> This is a simplification based on the observation that, for example,

> sumX * meanY = sumY * meanX = n * meanY * meanX.

>

>>

>> i don't really understand how you got these equations. could you explain

>> it

>> to me? I thought pearson correlation would be like this

>>

>> E(x_i-meanX)(y_i-meanY) / sdX*sdY

>

> That's right that's the expression for a population correlation, but

> we can really only compute a sample Pearson correlation coefficient,

> yes:

>

>

>> for my project I would need to get sample correlation coefficient which

>> would be something like this:

>>

>> sum(x_i-meanX)(y_i-meanY)/(N-1)

>

> Yeah that's fine too, this is another way of expressing the formula,

> though you're missing the two standard deviations in the denominator.

> It'll be clearer if I note that the mean of X and Y are 0.

>

>

>

> --

> View this message in context:

http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26540395.html> Sent from the Mahout User List mailing list archive at Nabble.com.

>

>