Need more info on MLT (More Like This) feature

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Need more info on MLT (More Like This) feature

Srisatya Pyla
Hi Solr Seatch Team,

I am a developer from IBM Kenexa Brassring.  We are using Solr Search engine for searching jobs in our applications.
We are planning to use MLT feature to get the similar matching documents (jobs) based on one document (job).

When trying to explore this option, we are using matching field as JobDescription of the job and we are getting some unrelated documents in the MLT results which are not expected.

The query like below:

http://[SOLR URL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258


We have few questions:
1) Is there any way we can get the matching score for each of the matching document we get in the MLT results, so that we can get the sorting done on the score to have the highest matching document at the top of the result.

2) Are there any best practices using MLT Handler?


Regards,

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile :
+91 9849315546
E-mail :srispyla@...


IBM Visakha Hills
Visakhapatnam, AP 530045
India


Reply | Threaded
Open this post in threaded view
|

Re: Need more info on MLT (More Like This) feature

Chee Yee Lim
I've been working with MLT handler (Solr 8.1.1) by calling it the same way
you did, http://[SOLR URL]/mlt. But the response is very unreliable with
90% of the same queries resulting in Java null pointer exception, and only
10% returning expected response. I do not know what is the cause of this.

I overcame this problem by using knnSearch via Stream handler (
https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch).
It is just a wrapper on MLT, and it works brilliantly. It is worth checking
it out if you are running Solr in cloud mode.

If you pass the fl="score"&sort="score desc" to knnSearch, you will be able
to get the results sorted by matching scores.

Best wishes,
Chee Yee

On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla <[hidden email]> wrote:

> Hi Solr Seatch Team,
>
> I am a developer from IBM Kenexa Brassring.  We are using Solr Search
> engine for searching jobs in our applications.
> We are planning to use MLT feature to get the similar matching documents
> (jobs) based on one document (job).
>
> When trying to explore this option, we are using matching field as
> JobDescription of the job and we are getting some unrelated documents in
> the MLT results which are not expected.
>
> The query like below:
>
> http://[SOLR
> URL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258
>
>
> *We have few questions*:
> 1) Is there any way we can get the matching score for each of the matching
> document we get in the MLT results, so that we can get the sorting done on
> the score to have the highest matching document at the top of the result.
>
> 2) Are there any best practices using MLT Handler?
>
>
> Regards,
> ------------------------------
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**[hidden email]* <[hidden email]>
> [image: IBM]
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Need more info on MLT (More Like This) feature

Srisatya Pyla
In reply to this post by Srisatya Pyla
Hi Chee Yee Lim,


Thank you for your quick response.  
We do not find much documentation on knnsearch on how to do use that.  
Could you please guide us with more info on how this can be used?

Can we use this the way we use Solr by querying with Solr URL like   http://[SOLR URL]/mlt.... ?  OR any other way?
And also please provide with any more detailed documentation if you have any.


Regards,

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile :
+91 9849315546
E-mail :[hidden email]


IBM Visakha Hills
Visakhapatnam, AP 530045
India






 
 
----- Original message -----
From: Chee Yee Lim <[hidden email]>
To: [hidden email]
Cc: Archana Gavini1 <[hidden email]>, Rajeev Kasarabada1 <[hidden email]>
Subject: [EXTERNAL] Re: Need more info on MLT (More Like This) feature
Date: Thu, Sep 12, 2019 6:43 PM
 

I've been working with MLT handler (Solr 8.1.1) by calling it the same way you did, <a href="http://[solr/" target="_blank">http://[SOLRURL]/mlt. But the response is very unreliable with 90% of the same queries resulting in Java null pointer exception, and only 10% returning expected response. I do not know what is the cause of this.
 
I overcame this problem by using knnSearch via Stream handler (https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch). It is just a wrapper on MLT, and it works brilliantly. It is worth checking it out if you are running Solr in cloud mode.
 
If you pass the fl="score"&sort="score desc" to knnSearch, you will be able to get the results sorted by matching scores.
 
Best wishes,
Chee Yee
 
On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla <[hidden email]> wrote:
Hi Solr Seatch Team,

I am a developer from IBM Kenexa Brassring.  We are using Solr Search engine for searching jobs in our applications.
We are planning to use MLT feature to get the similar matching documents (jobs) based on one document (job).


When trying to explore this option, we are using matching field as JobDescription of the job and we are getting some unrelated documents in the MLT results which are not expected.


The query like below:


<a href="http://[solr/" target="_blank">http://[SOLRURL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258


We have few questions
:
1) Is there any way we can get the matching score for each of the matching document we get in the MLT results, so that we can get the sorting done on the score to have the highest matching document at the top of the result.


2) Are there any best practices using MLT Handler?



Regards,

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile :
+91 9849315546
E-mail :
[hidden email]


IBM Visakha Hills
Visakhapatnam, AP 530045
India



 
 


Reply | Threaded
Open this post in threaded view
|

Re: Need more info on MLT (More Like This) feature

Chee Yee Lim
To use knnSearch, you need to submit a POST request to the Stream request handler.

Using your example query, you will need to rewrite them from this :

http://[SOLRURL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258

to this (using curl as an example to send POST request) :

curl --data-urlencode 'expr=knnSearch([collection_name],
id="1414462-25600-5258",
qf="jobdescription",
k=100,
fl="jobtitle,jobdescription,score",
sort="score desc",
fq="siteid:5258",
mintf=1,
mindf=1)' http://[SOLRURL]/stream

Note that this assume your document ID is sjkey.

More detailed documentation on how Stream handler works can be seen here, https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html.

Best wishes,
Chee Yee

On Fri, 13 Sep 2019 at 17:57, Srisatya Pyla <[hidden email]> wrote:
Hi Chee Yee Lim,


Thank you for your quick response.  
We do not find much documentation on knnsearch on how to do use that.  
Could you please guide us with more info on how this can be used?

Can we use this the way we use Solr by querying with Solr URL like   http://[SOLR URL]/mlt.... ?  OR any other way?
And also please provide with any more detailed documentation if you have any.


Regards,

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile :
+91 9849315546
E-mail :[hidden email]


IBM Visakha Hills
Visakhapatnam, AP 530045
India






 
 
----- Original message -----
From: Chee Yee Lim <[hidden email]>
To: [hidden email]
Cc: Archana Gavini1 <[hidden email]>, Rajeev Kasarabada1 <[hidden email]>
Subject: [EXTERNAL] Re: Need more info on MLT (More Like This) feature
Date: Thu, Sep 12, 2019 6:43 PM
 

I've been working with MLT handler (Solr 8.1.1) by calling it the same way you did, http://[SOLRURL]/mlt. But the response is very unreliable with 90% of the same queries resulting in Java null pointer exception, and only 10% returning expected response. I do not know what is the cause of this.
 
I overcame this problem by using knnSearch via Stream handler (https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch). It is just a wrapper on MLT, and it works brilliantly. It is worth checking it out if you are running Solr in cloud mode.
 
If you pass the fl="score"&sort="score desc" to knnSearch, you will be able to get the results sorted by matching scores.
 
Best wishes,
Chee Yee
 
On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla <[hidden email]> wrote:
Hi Solr Seatch Team,

I am a developer from IBM Kenexa Brassring.  We are using Solr Search engine for searching jobs in our applications.
We are planning to use MLT feature to get the similar matching documents (jobs) based on one document (job).


When trying to explore this option, we are using matching field as JobDescription of the job and we are getting some unrelated documents in the MLT results which are not expected.


The query like below:


http://[SOLRURL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258


We have few questions
:
1) Is there any way we can get the matching score for each of the matching document we get in the MLT results, so that we can get the sorting done on the score to have the highest matching document at the top of the result.


2) Are there any best practices using MLT Handler?



Regards,

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile :
+91 9849315546
E-mail :
[hidden email]


IBM Visakha Hills
Visakhapatnam, AP 530045
India



 
 


Reply | Threaded
Open this post in threaded view
|

RE: Need more info on MLT (More Like This) feature

Srisatya Pyla
Thank you very much for quick response. This is very much helpful to us.
While analyzing the results for some jobs, it is returning high score for a document which is not much relevant to the base document.
Is there any way we can improve the results and scoring?  
How it exactly give the score for matching document based on a matching field?  This is helpful to know why it is giving highest matching score for the specific documents.


Regards,

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile :
+91 9849315546
E-mail :[hidden email]


IBM Visakha Hills
Visakhapatnam, AP 530045
India





From:        Chee Yee Lim <[hidden email]>
To:        Srisatya Pyla <[hidden email]>
Cc:        [hidden email], Rajeev Kasarabada1 <[hidden email]>, Archana Gavini1 <[hidden email]>
Date:        13/09/2019 04:32 PM
Subject:        [EXTERNAL] Re: Need more info on MLT (More Like This) feature




To use knnSearch, you need to submit a POST request to the Stream request handler.

Using your example query, you will need to rewrite them from this :

http://[SOLRURL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258

to this (using curl as an example to send POST request) :

curl --data-urlencode 'expr=knnSearch([collection_name],
id="1414462-25600-5258",
qf="jobdescription",

k=100,
fl="jobtitle,jobdescription,score",
sort="score desc",
fq="siteid:5258",
mintf=1,
mindf=1)' http://[SOLRURL]/stream

Note that this assume your document ID is sjkey.

More detailed documentation on how Stream handler works can be seen here, https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html.

Best wishes,
Chee Yee

On Fri, 13 Sep 2019 at 17:57, Srisatya Pyla <[hidden email]> wrote:
Hi Chee Yee Lim,


Thank you for your quick response.  
We do not find much documentation on knnsearch on how to do use that.  
Could you please guide us with more info on how this can be used?


Can we use this the way we use Solr by querying with Solr URL like   http://[SOLR URL]/mlt.... ?  OR any other way?
And also please provide with any more detailed documentation if you have any.



Regards,

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile :
+91 9849315546
E-mail :
[hidden email]


IBM Visakha Hills
Visakhapatnam, AP 530045
India







 
 
----- Original message -----
From: Chee Yee Lim <
[hidden email]>
To:
[hidden email]
Cc: Archana Gavini1 <
[hidden email]>, Rajeev Kasarabada1 <[hidden email]>
Subject: [EXTERNAL] Re: Need more info on MLT (More Like This) feature
Date: Thu, Sep 12, 2019 6:43 PM
 
I've been working with MLT handler (Solr 8.1.1) by calling it the same way you did,
http://[SOLRURL]/mlt. But the response is very unreliable with 90% of the same queries resulting in Java null pointer exception, and only 10% returning expected response. I do not know what is the cause of this.
 
I overcame this problem by using knnSearch via Stream handler (
https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch). It is just a wrapper on MLT, and it works brilliantly. It is worth checking it out if you are running Solr in cloud mode.
 
If you pass the fl="score"&sort="score desc" to knnSearch, you will be able to get the results sorted by matching scores.
 
Best wishes,
Chee Yee
 
On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla <
[hidden email]> wrote:
Hi Solr Seatch Team,

I am a developer from IBM Kenexa Brassring.  We are using Solr Search engine for searching jobs in our applications.
We are planning to use MLT feature to get the similar matching documents (jobs) based on one document (job).

When trying to explore this option, we are using matching field as JobDescription of the job and we are getting some unrelated documents in the MLT results which are not expected.

The query like below:


http://[SOLR
URL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258


We have few questions
:
1) Is there any way we can get the matching score for each of the matching document we get in the MLT results, so that we can get the sorting done on the score to have the highest matching document at the top of the result.

2) Are there any best practices using MLT Handler?



Regards,


SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile :
+91 9849315546
E-mail :
[hidden email]


IBM Visakha Hills
Visakhapatnam, AP 530045
India


 
 



Reply | Threaded
Open this post in threaded view
|

Re: Need more info on MLT (More Like This) feature

David Hastings
As a side note, if you use shingles with the mlt handler I believe you will get better scores/relevant results. So “to be free” becomes indexes as “to_be” “to_be_free” and “be_free” but also as each word. It makes the index significantly larger but creates better “unique terms” in my opinion and improved the results for me at least.

> On Sep 13, 2019, at 2:51 PM, Srisatya Pyla <[hidden email]> wrote:
>
> Thank you very much for quick response. This is very much helpful to us.
> While analyzing the results for some jobs, it is returning high score for a document which is not much relevant to the base document.
> Is there any way we can improve the results and scoring?  
> How it exactly give the score for matching document based on a matching field?  This is helpful to know why it is giving highest matching score for the specific documents.
>
>
> Regards,
> SST  Narasimha Rao Pyla
> IBM Talent Management Solutions
> Mobile :+91 9849315546
> E-mail :[hidden email]
>
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
> From:        Chee Yee Lim <[hidden email]>
> To:        Srisatya Pyla <[hidden email]>
> Cc:        [hidden email], Rajeev Kasarabada1 <[hidden email]>, Archana Gavini1 <[hidden email]>
> Date:        13/09/2019 04:32 PM
> Subject:        [EXTERNAL] Re: Need more info on MLT (More Like This) feature
>
>
>
> To use knnSearch, you need to submit a POST request to the Stream request handler.
>
> Using your example query, you will need to rewrite them from this :
>
> http://[SOLRURL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258
>
> to this (using curl as an example to send POST request) :
>
> curl --data-urlencode 'expr=knnSearch([collection_name],
> id="1414462-25600-5258",
> qf="jobdescription",
> k=100,
> fl="jobtitle,jobdescription,score",
> sort="score desc",
> fq="siteid:5258",
> mintf=1,
> mindf=1)' http://[SOLRURL]/stream
>
> Note that this assume your document ID is sjkey.
>
> More detailed documentation on how Stream handler works can be seen here, https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html.
>
> Best wishes,
> Chee Yee
>
> On Fri, 13 Sep 2019 at 17:57, Srisatya Pyla <[hidden email]> wrote:
> Hi Chee Yee Lim,
>
>
> Thank you for your quick response.  
> We do not find much documentation on knnsearch on how to do use that.  
> Could you please guide us with more info on how this can be used?
>
> Can we use this the way we use Solr by querying with Solr URL like   http://[SOLR URL]/mlt.... ?  OR any other way?
> And also please provide with any more detailed documentation if you have any.
>
>
> Regards,
> SST  Narasimha Rao Pyla
> IBM Talent Management Solutions
> Mobile :+91 9849315546
> E-mail :[hidden email]
>
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
>
>  
>  
> ----- Original message -----
> From: Chee Yee Lim <[hidden email]>
> To: [hidden email]
> Cc: Archana Gavini1 <[hidden email]>, Rajeev Kasarabada1 <[hidden email]>
> Subject: [EXTERNAL] Re: Need more info on MLT (More Like This) feature
> Date: Thu, Sep 12, 2019 6:43 PM
>  
> I've been working with MLT handler (Solr 8.1.1) by calling it the same way you did, http://[SOLRURL]/mlt. But the response is very unreliable with 90% of the same queries resulting in Java null pointer exception, and only 10% returning expected response. I do not know what is the cause of this.
>  
> I overcame this problem by using knnSearch via Stream handler (https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch). It is just a wrapper on MLT, and it works brilliantly. It is worth checking it out if you are running Solr in cloud mode.
>  
> If you pass the fl="score"&sort="score desc" to knnSearch, you will be able to get the results sorted by matching scores.
>  
> Best wishes,
> Chee Yee
>  
> On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla <[hidden email]> wrote:
> Hi Solr Seatch Team,
>
> I am a developer from IBM Kenexa Brassring.  We are using Solr Search engine for searching jobs in our applications.
> We are planning to use MLT feature to get the similar matching documents (jobs) based on one document (job).
>
> When trying to explore this option, we are using matching field as JobDescription of the job and we are getting some unrelated documents in the MLT results which are not expected.
>
> The query like below:
>
> http://[SOLRURL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258
>
>
> We have few questions:
> 1) Is there any way we can get the matching score for each of the matching document we get in the MLT results, so that we can get the sorting done on the score to have the highest matching document at the top of the result.
>
> 2) Are there any best practices using MLT Handler?
>
>
> Regards,
> SST  Narasimha Rao Pyla
> IBM Talent Management Solutions
> Mobile :+91 9849315546
> E-mail :[hidden email]
>
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>  
>  
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Need more info on MLT (More Like This) feature

Chee Yee Lim
By default, MLT uses the top 25 terms from the target document to do
similarity searches. A quick look at the source code (
https://github.com/apache/lucene-solr/blob/master/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java
) and Lucene documentation (
https://lucene.apache.org/core/8_1_0/queries/org/apache/lucene/queries/mlt/MoreLikeThis.html
) suggests that MLT's similarity score is defined as a simple TF x IDF for
the top 25 terms. (Others who know more about MLT, please correct me if I
am wrong.)

An easy way to improve your results is to tune the mindf, maxdf, minwl and
maxwl parameters for knnSearch (
https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch
).

Best wishes,
Chee Yee

On Sat, 14 Sep 2019 at 04:09, Dave <[hidden email]> wrote:

> As a side note, if you use shingles with the mlt handler I believe you
> will get better scores/relevant results. So “to be free” becomes indexes as
> “to_be” “to_be_free” and “be_free” but also as each word. It makes the
> index significantly larger but creates better “unique terms” in my opinion
> and improved the results for me at least.
>
> On Sep 13, 2019, at 2:51 PM, Srisatya Pyla <[hidden email]> wrote:
>
> Thank you very much for quick response. This is very much helpful to us.
> While analyzing the results for some jobs, it is returning high score for
> a document which is not much relevant to the base document.
> Is there any way we can improve the results and scoring?
> How it exactly give the score for matching document based on a matching
> field?  This is helpful to know why it is giving highest matching score for
> the specific documents.
>
>
> Regards,
> ------------------------------
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**[hidden email]* <[hidden email]>
> [image: IBM]
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
> From:        Chee Yee Lim <[hidden email]>
> To:        Srisatya Pyla <[hidden email]>
> Cc:        [hidden email], Rajeev Kasarabada1 <
> [hidden email]>, Archana Gavini1 <[hidden email]>
> Date:        13/09/2019 04:32 PM
> Subject:        [EXTERNAL] Re: Need more info on MLT (More Like This)
> feature
> ------------------------------
>
>
>
> To use knnSearch, you need to submit a POST request to the Stream request
> handler.
>
> Using your example query, you will need to rewrite them from this :
>
> *http://[SOLR*
> URL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258
>
> to this (using curl as an example to send POST request) :
>
> curl --data-urlencode 'expr=knnSearch([collection_name],
> id="1414462-25600-5258",
> qf="jobdescription",
> k=100,
> fl="jobtitle,jobdescription,score",
> sort="score desc",
> fq="siteid:5258",
> mintf=1,
> mindf=1)' http://[SOLRURL]/stream
>
> Note that this assume your document ID is sjkey.
>
> More detailed documentation on how Stream handler works can be seen here,
> *https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html*
> <https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html>.
>
> Best wishes,
> Chee Yee
>
> On Fri, 13 Sep 2019 at 17:57, Srisatya Pyla <*[hidden email]*
> <[hidden email]>> wrote:
> Hi Chee Yee Lim,
>
>
> Thank you for your quick response.
> We do not find much documentation on knnsearch on how to do use that.
> Could you please guide us with more info on how this can be used?
>
> Can we use this the way we use Solr by querying with Solr URL like
> http://[SOLR URL]/mlt.... ?  OR any other way?
> And also please provide with any more detailed documentation if you have
> any.
>
>
> Regards,
> ------------------------------
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**[hidden email]* <[hidden email]>
> [image: IBM]
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
>
>
>
>
> ----- Original message -----
> From: Chee Yee Lim <*[hidden email]* <[hidden email]>>
> To: *[hidden email]* <[hidden email]>
> Cc: Archana Gavini1 <*[hidden email]* <[hidden email]>>, Rajeev
> Kasarabada1 <*[hidden email]* <[hidden email]>>
> Subject: [EXTERNAL] Re: Need more info on MLT (More Like This) feature
> Date: Thu, Sep 12, 2019 6:43 PM
>
> I've been working with MLT handler (Solr 8.1.1) by calling it the same way
> you did, *http://[SOLR*URL]/mlt. But the response is very unreliable with
> 90% of the same queries resulting in Java null pointer exception, and only
> 10% returning expected response. I do not know what is the cause of this.
>
> I overcame this problem by using knnSearch via Stream handler (
> *https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch*
> <https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch>).
> It is just a wrapper on MLT, and it works brilliantly. It is worth checking
> it out if you are running Solr in cloud mode.
>
> If you pass the fl="score"&sort="score desc" to knnSearch, you will be
> able to get the results sorted by matching scores.
>
> Best wishes,
> Chee Yee
>
> On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla <*[hidden email]*
> <[hidden email]>> wrote:
> Hi Solr Seatch Team,
>
> I am a developer from IBM Kenexa Brassring.  We are using Solr Search
> engine for searching jobs in our applications.
> We are planning to use MLT feature to get the similar matching documents
> (jobs) based on one document (job).
>
> When trying to explore this option, we are using matching field as
> JobDescription of the job and we are getting some unrelated documents in
> the MLT results which are not expected.
>
> The query like below:
>
> *http://[SOLR*
> URL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258
>
>
> *We have few questions*:
> 1) Is there any way we can get the matching score for each of the matching
> document we get in the MLT results, so that we can get the sorting done on
> the score to have the highest matching document at the top of the result.
>
> 2) Are there any best practices using MLT Handler?
>
>
> Regards,
> ------------------------------
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**[hidden email]* <[hidden email]>
>
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Need more info on MLT (More Like This) feature

Alessandro Benedetti
In addition to all the valuable information already shared I am curious to
understand why you think the results are unreliable.
Most of the times is the parameters that cause to ignore some of the terms
of the original document/corpus (as simple of the min/max document frequency
to consider or min term frequency in the source doc) .

I have been working a lot on the MLT in the past years and presenting the
work done (and internals) at various conferences/meetups.

I'll share some slides and some Jira issues that may help you:

https://www.youtube.com/watch?v=jkaj89XwHHw&t=540s
<https://www.youtube.com/watch?v=jkaj89XwHHw&t=540s>  
https://www.slideshare.net/SeaseLtd/how-the-lucene-more-like-this-works
<https://www.slideshare.net/SeaseLtd/how-the-lucene-more-like-this-works>  

https://issues.apache.org/jira/browse/LUCENE-8326
<https://issues.apache.org/jira/browse/LUCENE-8326>  
https://issues.apache.org/jira/browse/LUCENE-7802
<https://issues.apache.org/jira/browse/LUCENE-7802>  
https://issues.apache.org/jira/browse/LUCENE-7498
<https://issues.apache.org/jira/browse/LUCENE-7498>  

Generally speaking I favour the MLT query parser, it builds the MLT query
and gives you the chance to see it using the debug query.



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io