norm 2 and CosineDistanceMeasure are a good, fairly standard, choice. The

L1

norm is useful for some things too, but you can use any positive integer or

"INF"

for L_infinity normalization.

-jake

> Is it related to the distance calculation done

> by org.apache.mahout.common.distance.CosineDistanceMeasure for example?

> I am currently using --norm 2 in combination

> with org.apache.mahout.common.distance.CosineDistanceMeasure, is it ok,

> what

> other options I have for the --norm value?

> > It makes sure your vectors are all unit length (according to the norm you

> > choose - L2 norm

> > means: make sure each vector satisfies v.dot(v) == 1.0, for example)

> >

> > This makes sure that when you want to compare vectors to each other, a

> nice

> > "distance"

> > function is just distance(u, v) = 1 - u.dot(v)

> >

> > -jake

> > > What is the practical meaning of --norm parameter in the text-to-vector

> (

http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html)

> process?

