A question about the naming of the cluster and points in synthetic data cluster

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

A question about the naming of the cluster and points in synthetic data cluster

Liang Chenmin
Hi all,
    I am a newbie to Mahout. I have a question about how to incorporate some
naming for cluster and points in the synthetic data cluster example.

    After getting the output of the synthetic data cluster, we have 6
clusters, and each one looks like:

###First is the information of the cluster
0:name::{"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\":[0,1,2...59],\"values\":[29.58838112577385,...],\"numMappings\":60},\"cardinality\":60,\"lengthSquared\":-1.0,\"name\":\"\"}"}

###And then follow by points belong to this cluster:
Points:
{"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\":[0,1,2,...,59],\"values\":[28.7812,34.4632,......
],],\"numMappings\":60},\"cardinality\":60,\"lengthSquared\":-1.0,\"name\":\"\"}"},

{"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\"
....


Is there a way for me to specify the name of the cluster? And more
importantly, if I actually have ID for each point, how could I show the ID
for each point in the final result? I want to see clearly the IDs in each
cluster. I have used my own data also, and the output is similar to the ones
above, although the indices are not the same as my matrix are sparse. And as
my data set is large, getting the IDs is quite important for me.

Thanks,
Mandy
Reply | Threaded
Open this post in threaded view
|

Re: A question about the naming of the cluster and points in synthetic data cluster

Shashikant Kore
Check out ClusterDumper in utils
(utils/src/main/java/org/apache/mahout/utils/clustering/ClusterDumper.java).
This utility will print cluster ID and the associated vector IDs.

--shashi

On Wed, Nov 25, 2009 at 5:47 AM, Liang Chenmin <[hidden email]> wrote:

> Hi all,
>    I am a newbie to Mahout. I have a question about how to incorporate some
> naming for cluster and points in the synthetic data cluster example.
>
>    After getting the output of the synthetic data cluster, we have 6
> clusters, and each one looks like:
>
> ###First is the information of the cluster
> 0:name::{"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\":[0,1,2...59],\"values\":[29.58838112577385,...],\"numMappings\":60},\"cardinality\":60,\"lengthSquared\":-1.0,\"name\":\"\"}"}
>
> ###And then follow by points belong to this cluster:
> Points:
> {"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\":[0,1,2,...,59],\"values\":[28.7812,34.4632,......
> ],],\"numMappings\":60},\"cardinality\":60,\"lengthSquared\":-1.0,\"name\":\"\"}"},
>
> {"class":"org.apache.mahout.matrix.SparseVector","vector":"{\"values\":{\"indices\"
> ....
>
>
> Is there a way for me to specify the name of the cluster? And more
> importantly, if I actually have ID for each point, how could I show the ID
> for each point in the final result? I want to see clearly the IDs in each
> cluster. I have used my own data also, and the output is similar to the ones
> above, although the indices are not the same as my matrix are sparse. And as
> my data set is large, getting the IDs is quite important for me.
>
> Thanks,
> Mandy
>