Hi,

The scores aren't (log) normalized until they're loaded in the map

phase. Take a look at LDAState. The array

private final double[] logTotals; // log \sum p(w|t) for topic=1..nTopics

in LDAState has normalization constants. The method

logProbWordGivenTopic is intended for access... LDADriver#createState

is a round about way of creating an LDA State.

-- David

On Mon, Dec 6, 2010 at 12:06 PM, Quiroz Hernandez, Andres

<

[hidden email]> wrote:

> Thanks for your quick reply, Ted. It looks like either the probabilities are not normalized or the function being used is not a simple sum of log probabilities, because exp does not always return a value between 0 and 1. I will take a look at the code to see if I can find exactly how the value is calculated (but if anyone knows the function used, and if I can directly invert it to find P(w|t) please let me know).

>

> Thanks again,

>

> Andres

>

> -----Original Message-----

> From: Ted Dunning [mailto:

[hidden email]]

> Sent: Monday, December 06, 2010 11:57 AM

> To:

[hidden email]
> Subject: Re: Probability from log likelihood in LDA output

>

> Yes. I should be possible to use exp to get the actual probability. The

> fact that it is a sum

> of log probabilities just means that the probability is a product of

> probabilities.

>

> It is possible that the probabilities are not normalized, but that would be

> a bit surprising for

> this kind of algorithm.

>

> On Mon, Dec 6, 2010 at 8:02 AM, Quiroz Hernandez, Andres <

>

[hidden email]> wrote:

>

>> Hello,

>>

>> As I understand it, the output for LDA is a log likelihood value for

>> each word/topic pair, which is a function of log(P(w|t)). Is it possible

>> to invert that function to obtain P(w|t)? I have a feeling it is not,

>> since it looks like the final value is obtained as a sum of log

>> probabilities, but I just wanted to check, since an output as a

>> probability is more readable than the likelihood value given.

>>

>> Thanks,

>>

>> Andres

>>

>