any way to post json document to a MoreLikeThisHandler?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

any way to post json document to a MoreLikeThisHandler?

Matt Work Coarr
Hello,

Using a MoreLikeThisHandler, I was hoping to be able to pass in in the post
body a json document (the same format as a document indexed in my core, but
the document in the request is not and should not be added to the core).

I'm thinking it would handle an incoming document similar to how the
/update handler can split up a json document into the set of fields defined
in the schema (or auto created fields).

For instance, my input document would look like this:

{
  "id": 1234,
  "field1": "blah blah blah",
  "field2": "foo bar",
  "field3": 112233
}

And then I want to be able to use the MoreLikeThis query parameters to
determine which fields are used in the MLT comparison.

Thanks,
Matt
Reply | Threaded
Open this post in threaded view
|

Re: any way to post json document to a MoreLikeThisHandler?

Alexandre Rafalovitch
There are three ways to trigger MLT:
https://lucene.apache.org/solr/guide/7_4/morelikethis.html

MoreLikeThisHandler allows to supply text externally. Unfortunately, I
can't find the specific example demonstrating it, so not sure if it
just a blob of text or a document.

Regards,
   Alex.

On 11 September 2018 at 09:55, Matt Work Coarr <[hidden email]> wrote:

> Hello,
>
> Using a MoreLikeThisHandler, I was hoping to be able to pass in in the post
> body a json document (the same format as a document indexed in my core, but
> the document in the request is not and should not be added to the core).
>
> I'm thinking it would handle an incoming document similar to how the
> /update handler can split up a json document into the set of fields defined
> in the schema (or auto created fields).
>
> For instance, my input document would look like this:
>
> {
>   "id": 1234,
>   "field1": "blah blah blah",
>   "field2": "foo bar",
>   "field3": 112233
> }
>
> And then I want to be able to use the MoreLikeThis query parameters to
> determine which fields are used in the MLT comparison.
>
> Thanks,
> Matt
Reply | Threaded
Open this post in threaded view
|

Re: any way to post json document to a MoreLikeThisHandler?

Matt Work Coarr
Thanks Alex.  Yes, I've been using the MoreLikeThisHandler, but that takes
a block of text as input posted to the request, not the structured json
that corresponds to the fields.

On Tue, Sep 11, 2018 at 10:14 AM Alexandre Rafalovitch <[hidden email]>
wrote:

> There are three ways to trigger MLT:
> https://lucene.apache.org/solr/guide/7_4/morelikethis.html
>
> MoreLikeThisHandler allows to supply text externally. Unfortunately, I
> can't find the specific example demonstrating it, so not sure if it
> just a blob of text or a document.
>
> Regards,
>    Alex.
>
> On 11 September 2018 at 09:55, Matt Work Coarr <[hidden email]>
> wrote:
> > Hello,
> >
> > Using a MoreLikeThisHandler, I was hoping to be able to pass in in the
> post
> > body a json document (the same format as a document indexed in my core,
> but
> > the document in the request is not and should not be added to the core).
> >
> > I'm thinking it would handle an incoming document similar to how the
> > /update handler can split up a json document into the set of fields
> defined
> > in the schema (or auto created fields).
> >
> > For instance, my input document would look like this:
> >
> > {
> >   "id": 1234,
> >   "field1": "blah blah blah",
> >   "field2": "foo bar",
> >   "field3": 112233
> > }
> >
> > And then I want to be able to use the MoreLikeThis query parameters to
> > determine which fields are used in the MLT comparison.
> >
> > Thanks,
> > Matt
>
Reply | Threaded
Open this post in threaded view
|

Re: any way to post json document to a MoreLikeThisHandler?

Alexandre Rafalovitch
Hmm.

I guess the issue is that the handler is the one doing parsing, so the
input document can be in XML or JSON or CSV. And MLT as a handler is then a
competing end point.

So you actually want to use it later in a pipeline but with a document
constructed on the fly and not stored.

This may not exist right now. Though maybe some combination of
DumpRequestHandler and MLT as a search component could do the trick?

I would be curious to know if it can be made to work out of the box.
Otherwise, patches are welcome.... But they should not expect just JSON
input format.

Regards,
    Alex

On Tue, Sep 11, 2018, 4:57 PM Matt Work Coarr, <[hidden email]>
wrote:

> Thanks Alex.  Yes, I've been using the MoreLikeThisHandler, but that takes
> a block of text as input posted to the request, not the structured json
> that corresponds to the fields.
>
> On Tue, Sep 11, 2018 at 10:14 AM Alexandre Rafalovitch <[hidden email]
> >
> wrote:
>
> > There are three ways to trigger MLT:
> > https://lucene.apache.org/solr/guide/7_4/morelikethis.html
> >
> > MoreLikeThisHandler allows to supply text externally. Unfortunately, I
> > can't find the specific example demonstrating it, so not sure if it
> > just a blob of text or a document.
> >
> > Regards,
> >    Alex.
> >
> > On 11 September 2018 at 09:55, Matt Work Coarr <[hidden email]
> >
> > wrote:
> > > Hello,
> > >
> > > Using a MoreLikeThisHandler, I was hoping to be able to pass in in the
> > post
> > > body a json document (the same format as a document indexed in my core,
> > but
> > > the document in the request is not and should not be added to the
> core).
> > >
> > > I'm thinking it would handle an incoming document similar to how the
> > > /update handler can split up a json document into the set of fields
> > defined
> > > in the schema (or auto created fields).
> > >
> > > For instance, my input document would look like this:
> > >
> > > {
> > >   "id": 1234,
> > >   "field1": "blah blah blah",
> > >   "field2": "foo bar",
> > >   "field3": 112233
> > > }
> > >
> > > And then I want to be able to use the MoreLikeThis query parameters to
> > > determine which fields are used in the MLT comparison.
> > >
> > > Thanks,
> > > Matt
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: any way to post json document to a MoreLikeThisHandler?

Matt Work Coarr
Thank you Alex! I'll take a look at DumpRequestHandler and see if I can
pull pieces from there and from MLT.

I was also looking at DirectUpdateHandler2 to try and tease apart the logic
for parsing an incoming json (or xml or whatever incoming format is).  Do
you think that's worthwhile?

My thought was that this is what backs "/update" and that's how I'm loading
my json files.

It looks like DirectUpdateHandler2.split() might be relevant too.

Matt


On Tue, Sep 11, 2018 at 5:13 PM Alexandre Rafalovitch <[hidden email]>
wrote:

> Hmm.
>
> I guess the issue is that the handler is the one doing parsing, so the
> input document can be in XML or JSON or CSV. And MLT as a handler is then a
> competing end point.
>
> So you actually want to use it later in a pipeline but with a document
> constructed on the fly and not stored.
>
> This may not exist right now. Though maybe some combination of
> DumpRequestHandler and MLT as a search component could do the trick?
>
> I would be curious to know if it can be made to work out of the box.
> Otherwise, patches are welcome.... But they should not expect just JSON
> input format.
>
> Regards,
>     Alex
>
> On Tue, Sep 11, 2018, 4:57 PM Matt Work Coarr, <[hidden email]>
> wrote:
>
> > Thanks Alex.  Yes, I've been using the MoreLikeThisHandler, but that
> takes
> > a block of text as input posted to the request, not the structured json
> > that corresponds to the fields.
> >
> > On Tue, Sep 11, 2018 at 10:14 AM Alexandre Rafalovitch <
> [hidden email]
> > >
> > wrote:
> >
> > > There are three ways to trigger MLT:
> > > https://lucene.apache.org/solr/guide/7_4/morelikethis.html
> > >
> > > MoreLikeThisHandler allows to supply text externally. Unfortunately, I
> > > can't find the specific example demonstrating it, so not sure if it
> > > just a blob of text or a document.
> > >
> > > Regards,
> > >    Alex.
> > >
> > > On 11 September 2018 at 09:55, Matt Work Coarr <
> [hidden email]
> > >
> > > wrote:
> > > > Hello,
> > > >
> > > > Using a MoreLikeThisHandler, I was hoping to be able to pass in in
> the
> > > post
> > > > body a json document (the same format as a document indexed in my
> core,
> > > but
> > > > the document in the request is not and should not be added to the
> > core).
> > > >
> > > > I'm thinking it would handle an incoming document similar to how the
> > > > /update handler can split up a json document into the set of fields
> > > defined
> > > > in the schema (or auto created fields).
> > > >
> > > > For instance, my input document would look like this:
> > > >
> > > > {
> > > >   "id": 1234,
> > > >   "field1": "blah blah blah",
> > > >   "field2": "foo bar",
> > > >   "field3": 112233
> > > > }
> > > >
> > > > And then I want to be able to use the MoreLikeThis query parameters
> to
> > > > determine which fields are used in the MLT comparison.
> > > >
> > > > Thanks,
> > > > Matt
> > >
> >
>