two problems of using the lucene.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

two problems of using the lucene.

jason-51
Hi,

I got two problems of using the lucene and may need your help.

1. For each word, how the lucene calculate its weight. I only know for each
work in the document will be weighed by its tf/idf values.

2. Can I modify the lucene so that i use the term frequency instead of
tf/idf value to calculate the similarity between documents and queries.

--
Regards

Jiang Xing
Reply | Threaded
Open this post in threaded view
|

AW: two problems of using the lucene.

Klaus Schaefers
Hi,

you have to write your own similarity object and pass it to your analyzer.

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.h
tml

Cheers,

Klaus
-----Urspr√ľngliche Nachricht-----
Von: xing jiang [mailto:[hidden email]]
Gesendet: Sonntag, 5. Februar 2006 04:27
An: [hidden email]
Betreff: two problems of using the lucene.

Hi,

I got two problems of using the lucene and may need your help.

1. For each word, how the lucene calculate its weight. I only know for each
work in the document will be weighed by its tf/idf values.

2. Can I modify the lucene so that i use the term frequency instead of
tf/idf value to calculate the similarity between documents and queries.

--
Regards

Jiang Xing


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: two problems of using the lucene.

jason-51
Hi,

I try to read the source code of the lucene. But i only find in the
TermScorer.java where the tf/idf measure is really implemented. I guess that
whether the Queryparser class will convert each word into a termquery first.
Then, queries such as the the Booleanquery are built.

The source code of the Queryparser.java is hard to read.
....

regards
jiang xing

On 2/5/06, Klaus <[hidden email]> wrote:

>
> Hi,
>
> you have to write your own similarity object and pass it to your analyzer.
>
>
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.h
> tml
>
> Cheers,
>
> Klaus
> -----Urspr√ľngliche Nachricht-----
> Von: xing jiang [mailto:[hidden email]]
> Gesendet: Sonntag, 5. Februar 2006 04:27
> An: [hidden email]
> Betreff: two problems of using the lucene.
>
> Hi,
>
> I got two problems of using the lucene and may need your help.
>
> 1. For each word, how the lucene calculate its weight. I only know for
> each
> work in the document will be weighed by its tf/idf values.
>
> 2. Can I modify the lucene so that i use the term frequency instead of
> tf/idf value to calculate the similarity between documents and queries.
>
> --
> Regards
>
> Jiang Xing
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: two problems of using the lucene.

Erik Hatcher

On Feb 6, 2006, at 1:37 AM, jason wrote:
> The source code of the Queryparser.java is hard to read.

Look at QueryParser.jj instead.  QueryParser.java is generated using  
JavaCC and is thus not "source" code at all.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]