Document topic mapping problem using Mahout 0.9 CVB algorithm

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Document topic mapping problem using Mahout 0.9 CVB algorithm

newein
This post has NOT been accepted by the mailing list yet.
Hi,

I am trying to do the topics analysis on set of documents using the latest version of Mahout.

The output for topic to term mapping is proper with each topic having list of terms with corresponding probabilities.

But the when I tried getting the document to topic mapping , it only displays a set of topics starting with some letter. Like in this case all topics starting with letter a

Following is the sample code used to generate the document topic mapping:

VectorDumper.main(new String[]
                {
                "-i" , inputDocTopicsDir
                , "-o", oututDocTopicsDir
                , "-d", inputDictionaryDir
                , "-dt", "sequencefile"
                , "-sort", "true"
                , "-vs", "10" });


Sample output:
{2d:0.019996671414880783,3d:0.019994853350969108,4d:0.02000171234917903,5d:0.019994290328033588,a.config:0.01999309367417373,a.k.a:0.02000227944902019,a.system:0.01999771644223781,aaa:0.020003361639812457,aam:0.019990182999365072,aapm:0.020012465032122083,aapv:0.01999879522431889,aar:0.019995543474585993,aas:0.019995157547471696,aav:0.02000267326012652,ab:0.020025978185034182,aba:0.01999553819903237,abandon:0.020013355238553677,abandoned:0.01999559962237951,abandonment:0.019994194616256,abandons:0.02001433184497984,abatement:0.01997728075793184,abberationa:0.020001189392395737}