Parts of the Json response to a curl query are arrays, and parts are hashes

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Parts of the Json response to a curl query are arrays, and parts are hashes

rhys J
Is there some reason that text_general fields are returned as arrays, and
other fields are returned as hashes in the json response from a curl query?

Here's my curl query:

curl "http://10.40.10.14:8983/solr/dbtr/select?indent=on&q=debtor_id:393291"

Here's the response:

response":{"numFound":1,"start":0,"docs":[
      {
        "agent":[" "],
        "assign_id":["587"],
        "client_group":[" "],
        "credit_hold":false,
        "credit_limit":0.0,
        "credit_terms":["N30"],
        "currency":["USD"],
        "debtor_id":"393291",
        "dl1":["165743"],
        "dl2":["Great Plains"],
        "do_not_call":false,
        "do_not_report":false,
        "in_aris_date":"2009-10-19T00:00:00Z",
        "name1":["CRATE & BARREL"],
        "name2":[" "],
        "next_contact_date":"2019-10-17T00:00:00Z",
        "parent_customer_number":["215976"],
        "potential_bad_debt":true,
        "priority_followup":false,
        "reference_no":["165743"],
        "report_as":"CRATE & BARREL",
        "report_status":[" "],
        "risk":["Low"],
        "rms_acct_id":["Berger"],
        "salesperson":["Corp House"],
        "ssn1":["32"],
        "ssn2":["EXEMPT"],
        "status_code":["173"],
        "status_date":"2016-05-12T00:00:00Z",
        "watch_list":[0],
        "_version_":1648384727255613441,
        "data_signature":"f020b831dd6e553eed217125de13de850d1f4bbc"}]
  }}

As you can see, dates and booleans are hashes, and the text_general fields
(the only thing I can think of that is different) are arrays.

Why is this, and how can i make it return just a hash for the code to
handle?

One thing I did notice in the schema API is that even though I did not
choose MultiValued, it's set to true.

Is this a bug?

Thanks,

Rhys
Reply | Threaded
Open this post in threaded view
|

Re: Parts of the Json response to a curl query are arrays, and parts are hashes

Shawn Heisey-2
On 10/25/2019 1:48 PM, rhys J wrote:
> Is there some reason that text_general fields are returned as arrays, and
> other fields are returned as hashes in the json response from a curl query?

<snip>

> Here's the response:

<snip>

>          "dl2":["Great Plains"],
>          "do_not_call":false,

There are no hashes inside the document.  If there were, they would be
surrounded by {} characters.  The whole document is a hash, which is why
it has {} characters.  Referring to the snippet that I included above,
dl2 is mapped in the hash to an array, and do_not_call is mapped to a
boolean, not a hash.

When there is an array in search results, it happens because the field
is multiValued ... even if there is only one value, it is placed in an
array for consistency.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Parts of the Json response to a curl query are arrays, and parts are hashes

rhys J
> <snip>
>
> >          "dl2":["Great Plains"],
> >          "do_not_call":false,
>
> There are no hashes inside the document.  If there were, they would be
> surrounded by {} characters.  The whole document is a hash, which is why
> it has {} characters.  Referring to the snippet that I included above,
> dl2 is mapped in the hash to an array, and do_not_call is mapped to a
> boolean, not a hash.
>
> When there is an array in search results, it happens because the field
> is multiValued ... even if there is only one value, it is placed in an
> array for consistency.
>

So I went back to one of the fields that is multi-valued, which I
explicitly did not choose when I created the field, and I re-created it.

It still made the field multi-valued as true.

Why is this?

Thanks,

Rhys

>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: Parts of the Json response to a curl query are arrays, and parts are hashes

Shawn Heisey-2
On 10/25/2019 2:30 PM, rhys J wrote:
> So I went back to one of the fields that is multi-valued, which I
> explicitly did not choose when I created the field, and I re-created it.
>
> It still made the field multi-valued as true.
>
> Why is this?

Did you reload the core/collection or restart Solr so the new schema
would take effect? If it's SolrCloud, did you upload the changes to
zookeeper and then reload the collection?  SolrCloud does not use config
files on disk.

Assuming the answers to the above are yes, did you wipe the index and
rebuild it?  If not, there may be something in the Lucene index left
over from existing documents that indicates the multiValued status.  I
do not know if that's the case, but it might be.

What is the schema version in your schema? If it's not specified, it
might be 1.0.  The recommended version will depend on the Solr version
... I think the latest version is 1.6, but it might have advanced to 1.7.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Parts of the Json response to a curl query are arrays, and parts are hashes

rhys J
> Did you reload the core/collection or restart Solr so the new schema
> would take effect? If it's SolrCloud, did you upload the changes to
> zookeeper and then reload the collection?  SolrCloud does not use config
> files on disk.
>

So I have not done this part yet, but I noticed some things in the
managed-schema.

 the first was this (I did verify that the version of the schema is
up-to-date. I am doing an out of the box install of the latest Solr release.

I checked all the fields that I created (I will paste them below), and they
are NOT multi-valued. However, text_general is set to multi-valued as a
default?

 <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

Here are some of the fields I created through the API. When I created them,
I did NOT check the multi-valued box at all. However, when I then go to
look at the field through the API, it is marked Multi-valued. I am assuming
this is because of the fieldType definition above? Why is this set to
default to Multi-valued?

Will I break Solr if i change this to default to not multi-valued?

Thanks,

Rhys
Reply | Threaded
Open this post in threaded view
|

Re: Parts of the Json response to a curl query are arrays, and parts are hashes

rhys J
I forgot to include the fields created through the API:

<field name="assign_id" type="text_general" uninvertible="true"
indexed="true" stored="true"/>
  <field name="client_group" type="text_general" uninvertible="true"
sortMissingLast="true" indexed="true" stored="true"/>
  <field name="credit_class" type="text_general" uninvertible="true"
sortMissingLast="true" indexed="true" stored="true"/>
  <field name="credit_hold" type="boolean" uninvertible="true"
indexed="true" stored="true"/>
  <field name="credit_hold_date" type="pdate" uninvertible="true"
sortMissingLast="true" indexed="true" stored="true"/>
  <field name="credit_limit" type="pfloat" uninvertible="true"
sortMissingLast="true" indexed="true" stored="true"/>
  <field name="credit_terms" type="text_general" uninvertible="true"
sortMissingLast="true" indexed="true" stored="true"/>
  <field name="currency" type="text_general" uninvertible="true"
sortMissingLast="true" indexed="true" stored="true"/>
  <field name="data_signature" type="string" uninvertible="false"
indexed="false" stored="false"/>
  <field name="debtor_id" type="text_general" multiValued="false"
indexed="true" required="true" stored="true"/>
  <field name="dl1" type="text_general" uninvertible="true"
sortMissingLast="true" indexed="true" stored="true"/>
  <field name="dl2" type="text_general" uninvertible="true"
sortMissingLast="true" indexed="true" stored="true"/>
  <field name="do_not_call" type="boolean" uninvertible="true"
indexed="true" stored="true"/>
  <field name="do_not_call_date" type="pdate" uninvertible="true"
sortMissingLast="true" indexed="true" stored="true"/>

Thanks,

Rhys

On Mon, Oct 28, 2019 at 11:30 AM rhys J <[hidden email]> wrote:

>
>
>> Did you reload the core/collection or restart Solr so the new schema
>> would take effect? If it's SolrCloud, did you upload the changes to
>> zookeeper and then reload the collection?  SolrCloud does not use config
>> files on disk.
>>
>
> So I have not done this part yet, but I noticed some things in the
> managed-schema.
>
>  the first was this (I did verify that the version of the schema is
> up-to-date. I am doing an out of the box install of the latest Solr release.
>
> I checked all the fields that I created (I will paste them below), and
> they are NOT multi-valued. However, text_general is set to multi-valued as
> a default?
>
>  <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>     <analyzer type="index">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>       <filter class="solr.SynonymGraphFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
> Here are some of the fields I created through the API. When I created
> them, I did NOT check the multi-valued box at all. However, when I then go
> to look at the field through the API, it is marked Multi-valued. I am
> assuming this is because of the fieldType definition above? Why is this set
> to default to Multi-valued?
>
> Will I break Solr if i change this to default to not multi-valued?
>
> Thanks,
>
> Rhys
>
Reply | Threaded
Open this post in threaded view
|

Re: Parts of the Json response to a curl query are arrays, and parts are hashes

Shawn Heisey-2
In reply to this post by rhys J
On 10/28/2019 9:30 AM, rhys J wrote:
> Will I break Solr if i change this to default to not multi-valued?

If you are only indexing one value in those fields, then setting
multiValued to false will not break anything.

If an indexing request ever comes in that has more than one value for a
field that does not have multiValued set to true, that document (and any
others following it in the batch) will fail to index.  It is likely that
the reason the default is set to true is to avoid complaints from users
who DO send more than one value.

In situations where copyField is used to copy more than one field to a
target, the target must be multiValued, or that indexing will also fail.

Thanks,
Shawn