Does CloudMLTQParser support getting related documents from different shards?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Does CloudMLTQParser support getting related documents from different shards?

gnandre
Hi,

I have one custom Solr plugin that uses MoreLikeThis class. AFAIK,
MoreLikeThis handler does not support distributed mode and the issue is
still open for that - https://issues.apache.org/jira/browse/SOLR-5480.

However, I saw that there is some possibility to use CloudMLTQParser to
work around above issue. CloudMLTQParser claims to work in distributed mode
too -  https://issues.apache.org/jira/browse/SOLR-6248. When I took a look
at the code though,
https://github.com/apache/lucene-solr/blob/2d690885e554dda7b4b4e0f46f2bd9cacdb32df6/solr/core/src/java/org/apache/solr/search/mlt/CloudMLTQParser.java
,
it seems like that for the document for which we are finding out related
documents, that document is getting fetched now with real-time get request
handler.
core.getRequestHandler("/get").handleRequest(request, rsp);
This is good because now that document will get fetched from any shard
existing
in cloud wherever that document exists.

However, the part where the relevant related documents are supposed to be
fetched it still uses the same old sort of code.
e.g. MoreLikeThis mlt = new MoreLikeThis(req.getSearcher().
getIndexReader());

Isn't getSearcher bound to only the particular shard and AFAIK it does not
work across shards in cloud?

So how then, CloudMLTQParser works in SolrCloud mode? Does it even work?
Reply | Threaded
Open this post in threaded view
|

Re: Does CloudMLTQParser support getting related documents from different shards?

Erick Erickson
What results do you get when you just try it in cloud mode?

This is a _parser_, it’s just in charge of parsing the query and, in this case
getting the relevant from the indicated document to add to the
query while doing some sanity checking. The bits that
distribute the query to shards and collate the results are elsewhere.

Best,
Erick

> On Jan 3, 2020, at 5:43 PM, Arnold Bronley <[hidden email]> wrote:
>
> Hi,
>
> I have one custom Solr plugin that uses MoreLikeThis class. AFAIK,
> MoreLikeThis handler does not support distributed mode and the issue is
> still open for that - https://issues.apache.org/jira/browse/SOLR-5480.
>
> However, I saw that there is some possibility to use CloudMLTQParser to
> work around above issue. CloudMLTQParser claims to work in distributed mode
> too -  https://issues.apache.org/jira/browse/SOLR-6248. When I took a look
> at the code though,
> https://github.com/apache/lucene-solr/blob/2d690885e554dda7b4b4e0f46f2bd9cacdb32df6/solr/core/src/java/org/apache/solr/search/mlt/CloudMLTQParser.java
> ,
> it seems like that for the document for which we are finding out related
> documents, that document is getting fetched now with real-time get request
> handler.
> core.getRequestHandler("/get").handleRequest(request, rsp);
> This is good because now that document will get fetched from any shard
> existing
> in cloud wherever that document exists.
>
> However, the part where the relevant related documents are supposed to be
> fetched it still uses the same old sort of code.
> e.g. MoreLikeThis mlt = new MoreLikeThis(req.getSearcher().
> getIndexReader());
>
> Isn't getSearcher bound to only the particular shard and AFAIK it does not
> work across shards in cloud?
>
> So how then, CloudMLTQParser works in SolrCloud mode? Does it even work?

Reply | Threaded
Open this post in threaded view
|

Re: Does CloudMLTQParser support getting related documents from different shards?

gnandre
*  What results do you get when you just try it in cloud mode? *

When I try it in SolrCloud mode, the part that deals with fetching the
results from the same core works fine. However, the part that deals with
fetching results from other cores does not work.




*This is a _parser_, it’s just in charge of parsing the query and, in this
case getting the relevant from the indicated document to add to the query
while doing some sanity checking. The bits that distribute the query to
shards and collate the results are elsewhere.  *

AFAIK, the part that fetches the relevant documents is following:
MoreLikeThis mlt = new MoreLikeThis(req.getSearcher().getIndexReader());
mlt.like(docId, curBoostFields, tie)

mlt.like method returns the Lucene Query object. So should I just convert
this query object to SolrQuery object and handle the distributed calls
myself by making a request against a collection instead of particular core?
How would I convert Lucene Query to SolrQuery? Should I just use toString
method?





On Sat, Jan 4, 2020 at 9:21 AM Erick Erickson <[hidden email]>
wrote:

> What results do you get when you just try it in cloud mode?
>
> This is a _parser_, it’s just in charge of parsing the query and, in this
> case
> getting the relevant from the indicated document to add to the
> query while doing some sanity checking. The bits that
> distribute the query to shards and collate the results are elsewhere.
>
> Best,
> Erick
>
> > On Jan 3, 2020, at 5:43 PM, Arnold Bronley <[hidden email]>
> wrote:
> >
> > Hi,
> >
> > I have one custom Solr plugin that uses MoreLikeThis class. AFAIK,
> > MoreLikeThis handler does not support distributed mode and the issue is
> > still open for that - https://issues.apache.org/jira/browse/SOLR-5480.
> >
> > However, I saw that there is some possibility to use CloudMLTQParser to
> > work around above issue. CloudMLTQParser claims to work in distributed
> mode
> > too -  https://issues.apache.org/jira/browse/SOLR-6248. When I took a
> look
> > at the code though,
> >
> https://github.com/apache/lucene-solr/blob/2d690885e554dda7b4b4e0f46f2bd9cacdb32df6/solr/core/src/java/org/apache/solr/search/mlt/CloudMLTQParser.java
> > ,
> > it seems like that for the document for which we are finding out related
> > documents, that document is getting fetched now with real-time get
> request
> > handler.
> > core.getRequestHandler("/get").handleRequest(request, rsp);
> > This is good because now that document will get fetched from any shard
> > existing
> > in cloud wherever that document exists.
> >
> > However, the part where the relevant related documents are supposed to be
> > fetched it still uses the same old sort of code.
> > e.g. MoreLikeThis mlt = new MoreLikeThis(req.getSearcher().
> > getIndexReader());
> >
> > Isn't getSearcher bound to only the particular shard and AFAIK it does
> not
> > work across shards in cloud?
> >
> > So how then, CloudMLTQParser works in SolrCloud mode? Does it even work?
>
>
On Sat, Jan 4, 2020 at 9:21 AM Erick Erickson <[hidden email]>
wrote:

> What results do you get when you just try it in cloud mode?
>
> This is a _parser_, it’s just in charge of parsing the query and, in this
> case
> getting the relevant from the indicated document to add to the
> query while doing some sanity checking. The bits that
> distribute the query to shards and collate the results are elsewhere.
>
> Best,
> Erick
>
> > On Jan 3, 2020, at 5:43 PM, Arnold Bronley <[hidden email]>
> wrote:
> >
> > Hi,
> >
> > I have one custom Solr plugin that uses MoreLikeThis class. AFAIK,
> > MoreLikeThis handler does not support distributed mode and the issue is
> > still open for that - https://issues.apache.org/jira/browse/SOLR-5480.
> >
> > However, I saw that there is some possibility to use CloudMLTQParser to
> > work around above issue. CloudMLTQParser claims to work in distributed
> mode
> > too -  https://issues.apache.org/jira/browse/SOLR-6248. When I took a
> look
> > at the code though,
> >
> https://github.com/apache/lucene-solr/blob/2d690885e554dda7b4b4e0f46f2bd9cacdb32df6/solr/core/src/java/org/apache/solr/search/mlt/CloudMLTQParser.java
> > ,
> > it seems like that for the document for which we are finding out related
> > documents, that document is getting fetched now with real-time get
> request
> > handler.
> > core.getRequestHandler("/get").handleRequest(request, rsp);
> > This is good because now that document will get fetched from any shard
> > existing
> > in cloud wherever that document exists.
> >
> > However, the part where the relevant related documents are supposed to be
> > fetched it still uses the same old sort of code.
> > e.g. MoreLikeThis mlt = new MoreLikeThis(req.getSearcher().
> > getIndexReader());
> >
> > Isn't getSearcher bound to only the particular shard and AFAIK it does
> not
> > work across shards in cloud?
> >
> > So how then, CloudMLTQParser works in SolrCloud mode? Does it even work?
>
>