Norm in text vectors?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Norm in text vectors?

Bogdan94202
What is the practical meaning of --norm parameter in the text-to-vector (
http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html) process?

Best regards,
Bogdan
Reply | Threaded
Open this post in threaded view
|

Re: Norm in text vectors?

Jake Mannix
It makes sure your vectors are all unit length (according to the norm you
choose - L2 norm
means: make sure each vector satisfies v.dot(v) == 1.0, for example)

This makes sure that when you want to compare vectors to each other, a nice
"distance"
function is just distance(u, v) = 1 - u.dot(v)

  -jake

On Wed, Jan 13, 2010 at 4:22 PM, Bogdan Vatkov <[hidden email]>wrote:

> What is the practical meaning of --norm parameter in the text-to-vector (
> http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html) process?
>
> Best regards,
> Bogdan
>
Reply | Threaded
Open this post in threaded view
|

Re: Norm in text vectors?

Bogdan94202
Is it related to the distance calculation done
by org.apache.mahout.common.distance.CosineDistanceMeasure for example?
I am currently using --norm 2 in combination
with org.apache.mahout.common.distance.CosineDistanceMeasure, is it ok, what
other options I have for the --norm value?

On Thu, Jan 14, 2010 at 2:28 AM, Jake Mannix <[hidden email]> wrote:

> It makes sure your vectors are all unit length (according to the norm you
> choose - L2 norm
> means: make sure each vector satisfies v.dot(v) == 1.0, for example)
>
> This makes sure that when you want to compare vectors to each other, a nice
> "distance"
> function is just distance(u, v) = 1 - u.dot(v)
>
>  -jake
>
> On Wed, Jan 13, 2010 at 4:22 PM, Bogdan Vatkov <[hidden email]
> >wrote:
>
> > What is the practical meaning of --norm parameter in the text-to-vector (
> > http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html) process?
> >
> > Best regards,
> > Bogdan
> >
>



--
Best regards,
Bogdan
Reply | Threaded
Open this post in threaded view
|

Re: Norm in text vectors?

Jake Mannix
norm 2 and CosineDistanceMeasure are a good, fairly standard, choice.  The
L1
norm is useful for some things too, but you can use any positive integer or
"INF"
for L_infinity normalization.

  -jake

On Wed, Jan 13, 2010 at 4:32 PM, Bogdan Vatkov <[hidden email]>wrote:

> Is it related to the distance calculation done
> by org.apache.mahout.common.distance.CosineDistanceMeasure for example?
> I am currently using --norm 2 in combination
> with org.apache.mahout.common.distance.CosineDistanceMeasure, is it ok,
> what
> other options I have for the --norm value?
>
> On Thu, Jan 14, 2010 at 2:28 AM, Jake Mannix <[hidden email]>
> wrote:
>
> > It makes sure your vectors are all unit length (according to the norm you
> > choose - L2 norm
> > means: make sure each vector satisfies v.dot(v) == 1.0, for example)
> >
> > This makes sure that when you want to compare vectors to each other, a
> nice
> > "distance"
> > function is just distance(u, v) = 1 - u.dot(v)
> >
> >  -jake
> >
> > On Wed, Jan 13, 2010 at 4:22 PM, Bogdan Vatkov <[hidden email]
> > >wrote:
> >
> > > What is the practical meaning of --norm parameter in the text-to-vector
> (
> > > http://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html)
> process?
> > >
> > > Best regards,
> > > Bogdan
> > >
> >
>
>
>
> --
> Best regards,
> Bogdan
>