Re: MoreLikeThis return no results

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: MoreLikeThis return no results

Tom Roberts LUXONLINE
AUTOMATIC REPLY

Tom Roberts is out of the office till 2nd September 2008.

LUX reopens on 1st September 2008



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: MoreLikeThis return no results

mark harwood
MoreLikeThis needs to find the terms in your doc. It tries to do this by using TermFreqVectors which are stored in the index if you choose to add them at  index-time. If you haven't done this then it will fall back to reanalysing the content of the document usings an analyser (despite what the javadocs for the setAnalyzer method  say about not needing to set an analyzer when MoreLiking an existing document).

So your options are probably to re-index with term vectors turned on or set an appropriate choice of analyzer.

Cheers,
Mark
(only 3 days to go until Tom Roberts is back in the office! )



----- Original Message ----
From: davood <[hidden email]>
To: [hidden email]
Sent: Saturday, 30 August, 2008 7:05:35
Subject: MoreLikeThis return no results


Hi,

I'm trying to get MoreLikeThis working but it just returns no results. I
have lucene working for normal queries and indexing but MoreLikeThis Just
returns nothing. This is what I'm trying


IndexReader reader = IndexReader.open(INDEX_PATH);
IndexSearcher searcher = new IndexSearcher(INDEX_PATH);
MoreLikeThis likeThis = new MoreLikeThis(reader);
likeThis.setFieldNames(new String[] { "tag" ,"tit"});
Query likesQuery = likeThis.like(170); // document number I already
retrieved by hits.id(0)
Hits likesHits = searcher.search(likesQuery);        
It find nothing similar

Btw I've noticed similarity contributed package inside of lucene contains
nothing (a jar file with a license text file and another text file), I've
tried donloading it from subversion but there was no java class there, So I
had to get it from another web site. Why it's removed from subversion?
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/similarity/src/java/org/apache/lucene/search/similar/

Best.
--
View this message in context: http://www.nabble.com/MoreLikeThis-return-no-results-tp19230752p19230752.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Send instant messages to your online friends http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: MoreLikeThis return no results

davood-2
Hi,

I enabled the TermVector for  required fields using following piece of code,
Field titleField = new Field("title", title, Field.Store.NO, Field.Index.TOKENIZED, TermVector.YES);
and then re-indexed it. But again it shows no result.
I checked the stored documents and they TermVector exists and si correct but morelikethis return no result for a given document id.

What am I missing?

mark harwood wrote
MoreLikeThis needs to find the terms in your doc. It tries to do this by using TermFreqVectors which are stored in the index if you choose to add them at  index-time. If you haven't done this then it will fall back to reanalysing the content of the document usings an analyser (despite what the javadocs for the setAnalyzer method  say about not needing to set an analyzer when MoreLiking an existing document).

So your options are probably to re-index with term vectors turned on or set an appropriate choice of analyzer.

Cheers,
Mark
(only 3 days to go until Tom Roberts is back in the office! )
Reply | Threaded
Open this post in threaded view
|

Re: MoreLikeThis return no results

Marcelo F. Ochoa
Hi Dave:
  MoreLikeThis object has two parameters which controls his functionality:
        mlt.setMinTermFreq(minTermFreq.intValue());
        mlt.setMinDocFreq(minDocFreq.intValue());
  By default MinTermFreq is 2, so if your document has no terms with
freq greater than 2 will return a query with no terms which returns 0
hits.
  Try setting it to 1.
  Best regards, Marcelo.
On Mon, Sep 1, 2008 at 10:16 AM, davood <[hidden email]> wrote:

>
> Hi,
>
> I enabled the TermVector for  required fields using following piece of code,
> Field titleField = new Field("title", title, Field.Store.NO,
> Field.Index.TOKENIZED, TermVector.YES);
> and then re-indexed it. But again it shows no result.
> I checked the stored documents and they TermVector exists and si correct but
> morelikethis return no result for a given document id.
>
> What am I missing?
>
>
> mark harwood wrote:
>>
>> MoreLikeThis needs to find the terms in your doc. It tries to do this by
>> using TermFreqVectors which are stored in the index if you choose to add
>> them at  index-time. If you haven't done this then it will fall back to
>> reanalysing the content of the document usings an analyser (despite what
>> the javadocs for the setAnalyzer method  say about not needing to set an
>> analyzer when MoreLiking an existing document).
>>
>> So your options are probably to re-index with term vectors turned on or
>> set an appropriate choice of analyzer.
>>
>> Cheers,
>> Mark
>> (only 3 days to go until Tom Roberts is back in the office! )
>>
>
> --
> View this message in context: http://www.nabble.com/Re%3A-MoreLikeThis-return-no-results-tp19230763p19254591.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: MoreLikeThis return no results

mark harwood
In reply to this post by Tom Roberts LUXONLINE
MoreLikeThis essentially shortlists a large list of terms (found in example text or an existing doc) and uses them in a query.
To see what terms have been shortlisted try calling query.rewrite(reader) and then call toString() or extractTerms.

If this reveals no terms try using a debugger which should help you step through MLT's shortlisting logic and reveal the issue (e.g. can't find TermVector or all terms are greater than chosen doc frequency setting).




----- Original Message ----
From: davood <[hidden email]>
To: [hidden email]
Sent: Monday, 1 September, 2008 14:16:42
Subject: Re: MoreLikeThis return no results


Hi,

I enabled the TermVector for  required fields using following piece of code,
Field titleField = new Field("title", title, Field.Store.NO,
Field.Index.TOKENIZED, TermVector.YES);
and then re-indexed it. But again it shows no result.
I checked the stored documents and they TermVector exists and si correct but
morelikethis return no result for a given document id.

What am I missing?


mark harwood wrote:

>
> MoreLikeThis needs to find the terms in your doc. It tries to do this by
> using TermFreqVectors which are stored in the index if you choose to add
> them at  index-time. If you haven't done this then it will fall back to
> reanalysing the content of the document usings an analyser (despite what
> the javadocs for the setAnalyzer method  say about not needing to set an
> analyzer when MoreLiking an existing document).
>
> So your options are probably to re-index with term vectors turned on or
> set an appropriate choice of analyzer.
>
> Cheers,
> Mark
> (only 3 days to go until Tom Roberts is back in the office! )
>

--
View this message in context: http://www.nabble.com/Re%3A-MoreLikeThis-return-no-results-tp19230763p19254591.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: MoreLikeThis return no results

davood-2
Thanks so much for hints, now it works correctly, the problem was with mlt.setMinTermFreq.

Many thanks.