Advice on how to work with pure JSON data.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Advice on how to work with pure JSON data.

russell.lemaster

I have looked at many examples on how to do what I want, but they tend to only show fragments or they
are based on older versions of Solr. I'm hoping there are new features that make what I'm doing easier.

I am running version 6.5 and am testing by running in cloud mode but only on a single machine.

Basically, I have a large number of documents stored as JSON in individual files. I want to take that JSON
document and index it without having to do any pre-processing, etc. I also need to be able to write newly indexed
JSON data back to individual files in the same format.

For example, let's say I have a json document that looks like the following:

{
"id" : "bb903493-55b0-421f-a83e-2199ea11e136",
"productName_s" : "UsefulWidget",
"productCategory_s" : "tool",
"suppliers" : [
{
"id" : " bb903493-55b0-421f-a83e-2199ea11e221",
"name_s" : "Acme Tools",
"productNumber_s" : "10342UW"
}, {
"id" : " bb903493-55b0-421a-a83e-2199ea11e445",
"name_s" : "Snappy Tools",
"productNumber_s" : "ST-X100023"
}
],
"resellers" : [
{
"id" : "cc 903493-55b0-421f-a83e-2199ea11e221",
"name_s" : "Target",
"productSKU_s" : "TA092310342UW"
}, {
"id" : "bc903493-55b0-421a-a83e-2199ea11e445",
"name_s" : "Wal-Mart",
"productSKU_s" : "029342ABLSWM"
}
]
}

I know I can use the /update/json/docs handler to insert the above but from what I understand, I'd have to set up parameters
telling it how to split the children, etc. Though that is a bit of a pain, I can make that happen.

The problem is that, when I then try to query for the data, it comes back with _childDocuments_ instead of the names of the
child document lists. So, how can I have Solr return the document as it was originally indexed (I know it would be embedded
in the results structure, but I can deal with that)?

I am running version 6.5 and I am hoping there is a method I haven't seen documented that can do this. If not, can someone
point me to some examples of how to do this another way.

If there is no easy way to do this with the current version, can someone point me to a good resource for writing my own
handlers?

Thank you.








Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Advice on how to work with pure JSON data.

Mikhail Khludnev-2
This is one of the features of the epic
https://issues.apache.org/jira/browse/SOLR-10144.
Until it's done the only way to achieve this is to properly set many params
for
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents#TransformingResultDocuments-[subquery]

Note, here I assume that children mapping is static ie there is a limited
list of optional scopes.
Indexing and searching arbitrary JSON is esoteric (XML DB like) problem.
Also, beware of https://issues.apache.org/jira/browse/SOLR-10500. I hope to
fix it soon.

On Thu, Apr 20, 2017 at 10:15 PM, <[hidden email]> wrote:

>
> I have looked at many examples on how to do what I want, but they tend to
> only show fragments or they
> are based on older versions of Solr. I'm hoping there are new features
> that make what I'm doing easier.
>
> I am running version 6.5 and am testing by running in cloud mode but only
> on a single machine.
>
> Basically, I have a large number of documents stored as JSON in individual
> files. I want to take that JSON
> document and index it without having to do any pre-processing, etc. I also
> need to be able to write newly indexed
> JSON data back to individual files in the same format.
>
> For example, let's say I have a json document that looks like the
> following:
>
> {
> "id" : "bb903493-55b0-421f-a83e-2199ea11e136",
> "productName_s" : "UsefulWidget",
> "productCategory_s" : "tool",
> "suppliers" : [
> {
> "id" : " bb903493-55b0-421f-a83e-2199ea11e221",
> "name_s" : "Acme Tools",
> "productNumber_s" : "10342UW"
> }, {
> "id" : " bb903493-55b0-421a-a83e-2199ea11e445",
> "name_s" : "Snappy Tools",
> "productNumber_s" : "ST-X100023"
> }
> ],
> "resellers" : [
> {
> "id" : "cc 903493-55b0-421f-a83e-2199ea11e221",
> "name_s" : "Target",
> "productSKU_s" : "TA092310342UW"
> }, {
> "id" : "bc903493-55b0-421a-a83e-2199ea11e445",
> "name_s" : "Wal-Mart",
> "productSKU_s" : "029342ABLSWM"
> }
> ]
> }
>
> I know I can use the /update/json/docs handler to insert the above but
> from what I understand, I'd have to set up parameters
> telling it how to split the children, etc. Though that is a bit of a pain,
> I can make that happen.
>
> The problem is that, when I then try to query for the data, it comes back
> with _childDocuments_ instead of the names of the
> child document lists. So, how can I have Solr return the document as it
> was originally indexed (I know it would be embedded
> in the results structure, but I can deal with that)?
>
> I am running version 6.5 and I am hoping there is a method I haven't seen
> documented that can do this. If not, can someone
> point me to some examples of how to do this another way.
>
> If there is no easy way to do this with the current version, can someone
> point me to a good resource for writing my own
> handlers?
>
> Thank you.
>
>
>
>
>
>
>
>
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Advice on how to work with pure JSON data.

russell.lemaster
One thing I forgot to mention in my original post is that I wish to do this using the SolrJ client.
I have my own rest server that presents a common API to our users, but the back-end can be
anything I wish. I have been using "that other Lucene based product" :), but I wish to stick to
a product that is more open and that perhaps I can contribute to.

I've searched for SolrJ examples for child documents and unfortunately there are far too
many references to implementations based off of older versions of Solr. Specifically, I would
like to insert beans with multiple child collections in them, but the latest I've read says this
is not currently possible. Is that still true?

In short, It isn't so important that REST based requests / responses from Solr are pure JSON
so long as I can do what I want from the java client.

Do you know if there have been recent additions / enhancements up through 6.5 that make
this more straight-forward?

Thanks


----- Original Message -----

From: "Mikhail Khludnev" <[hidden email]>
To: "solr-user" <[hidden email]>
Sent: Thursday, April 20, 2017 3:38:11 PM
Subject: Re: Advice on how to work with pure JSON data.

This is one of the features of the epic
https://issues.apache.org/jira/browse/SOLR-10144.
Until it's done the only way to achieve this is to properly set many params
for
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents#TransformingResultDocuments-[subquery]

Note, here I assume that children mapping is static ie there is a limited
list of optional scopes.
Indexing and searching arbitrary JSON is esoteric (XML DB like) problem.
Also, beware of https://issues.apache.org/jira/browse/SOLR-10500. I hope to
fix it soon.

On Thu, Apr 20, 2017 at 10:15 PM, <[hidden email]> wrote:

>
> I have looked at many examples on how to do what I want, but they tend to
> only show fragments or they
> are based on older versions of Solr. I'm hoping there are new features
> that make what I'm doing easier.
>
> I am running version 6.5 and am testing by running in cloud mode but only
> on a single machine.
>
> Basically, I have a large number of documents stored as JSON in individual
> files. I want to take that JSON
> document and index it without having to do any pre-processing, etc. I also
> need to be able to write newly indexed
> JSON data back to individual files in the same format.
>
> For example, let's say I have a json document that looks like the
> following:
>
> {
> "id" : "bb903493-55b0-421f-a83e-2199ea11e136",
> "productName_s" : "UsefulWidget",
> "productCategory_s" : "tool",
> "suppliers" : [
> {
> "id" : " bb903493-55b0-421f-a83e-2199ea11e221",
> "name_s" : "Acme Tools",
> "productNumber_s" : "10342UW"
> }, {
> "id" : " bb903493-55b0-421a-a83e-2199ea11e445",
> "name_s" : "Snappy Tools",
> "productNumber_s" : "ST-X100023"
> }
> ],
> "resellers" : [
> {
> "id" : "cc 903493-55b0-421f-a83e-2199ea11e221",
> "name_s" : "Target",
> "productSKU_s" : "TA092310342UW"
> }, {
> "id" : "bc903493-55b0-421a-a83e-2199ea11e445",
> "name_s" : "Wal-Mart",
> "productSKU_s" : "029342ABLSWM"
> }
> ]
> }
>
> I know I can use the /update/json/docs handler to insert the above but
> from what I understand, I'd have to set up parameters
> telling it how to split the children, etc. Though that is a bit of a pain,
> I can make that happen.
>
> The problem is that, when I then try to query for the data, it comes back
> with _childDocuments_ instead of the names of the
> child document lists. So, how can I have Solr return the document as it
> was originally indexed (I know it would be embedded
> in the results structure, but I can deal with that)?
>
> I am running version 6.5 and I am hoping there is a method I haven't seen
> documented that can do this. If not, can someone
> point me to some examples of how to do this another way.
>
> If there is no easy way to do this with the current version, can someone
> point me to a good resource for writing my own
> handlers?
>
> Thank you.
>
>
>
>
>
>
>
>
>


--
Sincerely yours
Mikhail Khludnev

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Advice on how to work with pure JSON data.

Mikhail Khludnev-2
Hello,
See below.

On Fri, Apr 21, 2017 at 8:21 AM, <[hidden email]> wrote:

> One thing I forgot to mention in my original post is that I wish to do
> this using the SolrJ client.
> I have my own rest server that presents a common API to our users, but the
> back-end can be
> anything I wish. I have been using "that other Lucene based product" :),
> but I wish to stick to
> a product that is more open and that perhaps I can contribute to.
>
> I've searched for SolrJ examples for child documents and unfortunately
> there are far too
> many references to implementations based off of older versions of Solr.
> Specifically, I would
> like to insert beans with multiple child collections in them, but the
> latest I've read says this
> is not currently possible. Is that still true?
>
Right. That how it was done at SOLR-1945
Now it throws cannot have more than one Field with child=true

>
> In short, It isn't so important that REST based requests / responses from
> Solr are pure JSON
> so long as I can do what I want from the java client.
>
> Do you know if there have been recent additions / enhancements up through
> 6.5 that make
> this more straight-forward?
>
Nothing new there.


>
> Thanks
>
>
> ----- Original Message -----
>
> From: "Mikhail Khludnev" <[hidden email]>
> To: "solr-user" <[hidden email]>
> Sent: Thursday, April 20, 2017 3:38:11 PM
> Subject: Re: Advice on how to work with pure JSON data.
>
> This is one of the features of the epic
> https://issues.apache.org/jira/browse/SOLR-10144.
> Until it's done the only way to achieve this is to properly set many params
> for
> https://cwiki.apache.org/confluence/display/solr/
> Transforming+Result+Documents#TransformingResultDocuments-[subquery]
>
> Note, here I assume that children mapping is static ie there is a limited
> list of optional scopes.
> Indexing and searching arbitrary JSON is esoteric (XML DB like) problem.
> Also, beware of https://issues.apache.org/jira/browse/SOLR-10500. I hope
> to
> fix it soon.
>
> On Thu, Apr 20, 2017 at 10:15 PM, <[hidden email]> wrote:
>
> >
> > I have looked at many examples on how to do what I want, but they tend to
> > only show fragments or they
> > are based on older versions of Solr. I'm hoping there are new features
> > that make what I'm doing easier.
> >
> > I am running version 6.5 and am testing by running in cloud mode but only
> > on a single machine.
> >
> > Basically, I have a large number of documents stored as JSON in
> individual
> > files. I want to take that JSON
> > document and index it without having to do any pre-processing, etc. I
> also
> > need to be able to write newly indexed
> > JSON data back to individual files in the same format.
> >
> > For example, let's say I have a json document that looks like the
> > following:
> >
> > {
> > "id" : "bb903493-55b0-421f-a83e-2199ea11e136",
> > "productName_s" : "UsefulWidget",
> > "productCategory_s" : "tool",
> > "suppliers" : [
> > {
> > "id" : " bb903493-55b0-421f-a83e-2199ea11e221",
> > "name_s" : "Acme Tools",
> > "productNumber_s" : "10342UW"
> > }, {
> > "id" : " bb903493-55b0-421a-a83e-2199ea11e445",
> > "name_s" : "Snappy Tools",
> > "productNumber_s" : "ST-X100023"
> > }
> > ],
> > "resellers" : [
> > {
> > "id" : "cc 903493-55b0-421f-a83e-2199ea11e221",
> > "name_s" : "Target",
> > "productSKU_s" : "TA092310342UW"
> > }, {
> > "id" : "bc903493-55b0-421a-a83e-2199ea11e445",
> > "name_s" : "Wal-Mart",
> > "productSKU_s" : "029342ABLSWM"
> > }
> > ]
> > }
> >
> > I know I can use the /update/json/docs handler to insert the above but
> > from what I understand, I'd have to set up parameters
> > telling it how to split the children, etc. Though that is a bit of a
> pain,
> > I can make that happen.
> >
> > The problem is that, when I then try to query for the data, it comes back
> > with _childDocuments_ instead of the names of the
> > child document lists. So, how can I have Solr return the document as it
> > was originally indexed (I know it would be embedded
> > in the results structure, but I can deal with that)?
> >
> > I am running version 6.5 and I am hoping there is a method I haven't seen
> > documented that can do this. If not, can someone
> > point me to some examples of how to do this another way.
> >
> > If there is no easy way to do this with the current version, can someone
> > point me to a good resource for writing my own
> > handlers?
> >
> > Thank you.
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
>


--
Sincerely yours
Mikhail Khludnev
Loading...