norm 2 and CosineDistanceMeasure are a good, fairly standard, choice. The

L1

norm is useful for some things too, but you can use any positive integer or

"INF"

for L_infinity normalization.

-jake

On Wed, Jan 13, 2010 at 4:32 PM, Bogdan Vatkov <

[hidden email]>wrote:

> Is it related to the distance calculation done

> by org.apache.mahout.common.distance.CosineDistanceMeasure for example?

> I am currently using --norm 2 in combination

> with org.apache.mahout.common.distance.CosineDistanceMeasure, is it ok,

> what

> other options I have for the --norm value?

>

> On Thu, Jan 14, 2010 at 2:28 AM, Jake Mannix <

[hidden email]>

> wrote:

>

> > It makes sure your vectors are all unit length (according to the norm you

> > choose - L2 norm

> > means: make sure each vector satisfies v.dot(v) == 1.0, for example)

> >

> > This makes sure that when you want to compare vectors to each other, a

> nice

> > "distance"

> > function is just distance(u, v) = 1 - u.dot(v)

> >

> > -jake

> >

> > On Wed, Jan 13, 2010 at 4:22 PM, Bogdan Vatkov <

[hidden email]
> > >wrote:

> >

> > > What is the practical meaning of --norm parameter in the text-to-vector

> (

> > >

http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html)

> process?

> > >

> > > Best regards,

> > > Bogdan

> > >

> >

>

>

>

> --

> Best regards,

> Bogdan

>