JSON from Term Vectors Component

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

JSON from Term Vectors Component

Doug Turnbull
Hi all,

I was curious if anyone had any tips on parsing the JSON response of the
term vectors component? Or anyway to force it to be more standard JSON? It
appears to be very heavily nested and idiosyncratic JSON, such as below.

Notice the lists, within lists, within lists. Where the keys are adjacent
items in the list. Is there a reason this isn't a JSON dictionary? Instead
you have to build a stateful list parser that just seems prone to errors...

Any thoughts or ideas are very welcome, I probably just need to do
something rather simple here...

"termVectors": [
"D100000", [
"uniqueKey", "D100000",
"body", [
"1", [
"positions", [
"position", 92,
"position", 113
]
],
"10", [ ...

--
*Doug Turnbull **| CTO* | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
Reply | Threaded
Open this post in threaded view
|

Re: JSON from Term Vectors Component

MUNENDRA S N
>
> Notice the lists, within lists, within lists. Where the keys are adjacent
> items in the list. Is there a reason this isn't a JSON dictionary?
>
I think this is because of NamedList. Have you tried using json.nl=map as a
query parameter for this case?

Regards,
Munendra S N



On Thu, Feb 6, 2020 at 10:01 PM Doug Turnbull <
[hidden email]> wrote:

> Hi all,
>
> I was curious if anyone had any tips on parsing the JSON response of the
> term vectors component? Or anyway to force it to be more standard JSON? It
> appears to be very heavily nested and idiosyncratic JSON, such as below.
>
> Notice the lists, within lists, within lists. Where the keys are adjacent
> items in the list. Is there a reason this isn't a JSON dictionary? Instead
> you have to build a stateful list parser that just seems prone to errors...
>
> Any thoughts or ideas are very welcome, I probably just need to do
> something rather simple here...
>
> "termVectors": [
> "D100000", [
> "uniqueKey", "D100000",
> "body", [
> "1", [
> "positions", [
> "position", 92,
> "position", 113
> ]
> ],
> "10", [ ...
>
> --
> *Doug Turnbull **| CTO* | OpenSource Connections
> <http://opensourceconnections.com>, LLC | 240.476.9983
> Author: Relevant Search <http://manning.com/turnbull>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
Reply | Threaded
Open this post in threaded view
|

Re: JSON from Term Vectors Component

Doug Turnbull
Thanks for the tip,

The issue is json.nl produces non-standard json with duplicate keys. Solr
generates the following, which json lint fails given multiple keys

{
"positions": {
"position": 155,
"position": 844,
"position": 1726
}
}

On Thu, Feb 6, 2020 at 11:36 AM Munendra S N <[hidden email]>
wrote:

> >
> > Notice the lists, within lists, within lists. Where the keys are adjacent
> > items in the list. Is there a reason this isn't a JSON dictionary?
> >
> I think this is because of NamedList. Have you tried using json.nl=map as
> a
> query parameter for this case?
>
> Regards,
> Munendra S N
>
>
>
> On Thu, Feb 6, 2020 at 10:01 PM Doug Turnbull <
> [hidden email]> wrote:
>
> > Hi all,
> >
> > I was curious if anyone had any tips on parsing the JSON response of the
> > term vectors component? Or anyway to force it to be more standard JSON?
> It
> > appears to be very heavily nested and idiosyncratic JSON, such as below.
> >
> > Notice the lists, within lists, within lists. Where the keys are adjacent
> > items in the list. Is there a reason this isn't a JSON dictionary?
> Instead
> > you have to build a stateful list parser that just seems prone to
> errors...
> >
> > Any thoughts or ideas are very welcome, I probably just need to do
> > something rather simple here...
> >
> > "termVectors": [
> > "D100000", [
> > "uniqueKey", "D100000",
> > "body", [
> > "1", [
> > "positions", [
> > "position", 92,
> > "position", 113
> > ]
> > ],
> > "10", [ ...
> >
> > --
> > *Doug Turnbull **| CTO* | OpenSource Connections
> > <http://opensourceconnections.com>, LLC | 240.476.9983
> > Author: Relevant Search <http://manning.com/turnbull>
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless
> > of whether attachments are marked as such.
> >
>


--
*Doug Turnbull **| CTO* | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
Reply | Threaded
Open this post in threaded view
|

Re: JSON from Term Vectors Component

Walter Underwood
Repeated keys are quite legal in JSON, but many libraries don’t support that.

It does look like that data layout could be redesigned to be more portable.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 6, 2020, at 8:38 AM, Doug Turnbull <[hidden email]> wrote:
>
> Thanks for the tip,
>
> The issue is json.nl produces non-standard json with duplicate keys. Solr
> generates the following, which json lint fails given multiple keys
>
> {
> "positions": {
> "position": 155,
> "position": 844,
> "position": 1726
> }
> }
>
> On Thu, Feb 6, 2020 at 11:36 AM Munendra S N <[hidden email]>
> wrote:
>
>>>
>>> Notice the lists, within lists, within lists. Where the keys are adjacent
>>> items in the list. Is there a reason this isn't a JSON dictionary?
>>>
>> I think this is because of NamedList. Have you tried using json.nl=map as
>> a
>> query parameter for this case?
>>
>> Regards,
>> Munendra S N
>>
>>
>>
>> On Thu, Feb 6, 2020 at 10:01 PM Doug Turnbull <
>> [hidden email]> wrote:
>>
>>> Hi all,
>>>
>>> I was curious if anyone had any tips on parsing the JSON response of the
>>> term vectors component? Or anyway to force it to be more standard JSON?
>> It
>>> appears to be very heavily nested and idiosyncratic JSON, such as below.
>>>
>>> Notice the lists, within lists, within lists. Where the keys are adjacent
>>> items in the list. Is there a reason this isn't a JSON dictionary?
>> Instead
>>> you have to build a stateful list parser that just seems prone to
>> errors...
>>>
>>> Any thoughts or ideas are very welcome, I probably just need to do
>>> something rather simple here...
>>>
>>> "termVectors": [
>>> "D100000", [
>>> "uniqueKey", "D100000",
>>> "body", [
>>> "1", [
>>> "positions", [
>>> "position", 92,
>>> "position", 113
>>> ]
>>> ],
>>> "10", [ ...
>>>
>>> --
>>> *Doug Turnbull **| CTO* | OpenSource Connections
>>> <http://opensourceconnections.com>, LLC | 240.476.9983
>>> Author: Relevant Search <http://manning.com/turnbull>
>>> This e-mail and all contents, including attachments, is considered to be
>>> Company Confidential unless explicitly stated otherwise, regardless
>>> of whether attachments are marked as such.
>>>
>>
>
>
> --
> *Doug Turnbull **| CTO* | OpenSource Connections
> <http://opensourceconnections.com>, LLC | 240.476.9983
> Author: Relevant Search <http://manning.com/turnbull>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.

Reply | Threaded
Open this post in threaded view
|

Re: JSON from Term Vectors Component

Doug Turnbull
Well that is interesting, I did not know that! Thanks Walter...

https://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object

I gave it a go in Python (what I'm using) to see what would happen, indeed
it gives some odd behavior

In [4]: jsonStr = ' {"test": 1, "test": 2} '


In [5]: json.loads(jsonStr)

Out[5]: {'test': 2}

On Thu, Feb 6, 2020 at 11:49 AM Walter Underwood <[hidden email]>
wrote:

> Repeated keys are quite legal in JSON, but many libraries don’t support
> that.
>
> It does look like that data layout could be redesigned to be more portable.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
> > On Feb 6, 2020, at 8:38 AM, Doug Turnbull <
> [hidden email]> wrote:
> >
> > Thanks for the tip,
> >
> > The issue is json.nl produces non-standard json with duplicate keys.
> Solr
> > generates the following, which json lint fails given multiple keys
> >
> > {
> > "positions": {
> > "position": 155,
> > "position": 844,
> > "position": 1726
> > }
> > }
> >
> > On Thu, Feb 6, 2020 at 11:36 AM Munendra S N <[hidden email]>
> > wrote:
> >
> >>>
> >>> Notice the lists, within lists, within lists. Where the keys are
> adjacent
> >>> items in the list. Is there a reason this isn't a JSON dictionary?
> >>>
> >> I think this is because of NamedList. Have you tried using json.nl=map
> as
> >> a
> >> query parameter for this case?
> >>
> >> Regards,
> >> Munendra S N
> >>
> >>
> >>
> >> On Thu, Feb 6, 2020 at 10:01 PM Doug Turnbull <
> >> [hidden email]> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I was curious if anyone had any tips on parsing the JSON response of
> the
> >>> term vectors component? Or anyway to force it to be more standard JSON?
> >> It
> >>> appears to be very heavily nested and idiosyncratic JSON, such as
> below.
> >>>
> >>> Notice the lists, within lists, within lists. Where the keys are
> adjacent
> >>> items in the list. Is there a reason this isn't a JSON dictionary?
> >> Instead
> >>> you have to build a stateful list parser that just seems prone to
> >> errors...
> >>>
> >>> Any thoughts or ideas are very welcome, I probably just need to do
> >>> something rather simple here...
> >>>
> >>> "termVectors": [
> >>> "D100000", [
> >>> "uniqueKey", "D100000",
> >>> "body", [
> >>> "1", [
> >>> "positions", [
> >>> "position", 92,
> >>> "position", 113
> >>> ]
> >>> ],
> >>> "10", [ ...
> >>>
> >>> --
> >>> *Doug Turnbull **| CTO* | OpenSource Connections
> >>> <http://opensourceconnections.com>, LLC | 240.476.9983
> >>> Author: Relevant Search <http://manning.com/turnbull>
> >>> This e-mail and all contents, including attachments, is considered to
> be
> >>> Company Confidential unless explicitly stated otherwise, regardless
> >>> of whether attachments are marked as such.
> >>>
> >>
> >
> >
> > --
> > *Doug Turnbull **| CTO* | OpenSource Connections
> > <http://opensourceconnections.com>, LLC | 240.476.9983
> > Author: Relevant Search <http://manning.com/turnbull>
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless
> > of whether attachments are marked as such.
>
>

--
*Doug Turnbull **| CTO* | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
Reply | Threaded
Open this post in threaded view
|

Re: JSON from Term Vectors Component

Walter Underwood
It is one of those things that happens when you don’t have a working group beat on a spec for six months. With an IETF process, I bet JSON would disallow duplicate keys and have comments. It might even have a datetime data type or at least recommend ISO8601 in a string.

I was on the Atom working group. That is still a solid spec.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 6, 2020, at 8:56 AM, Doug Turnbull <[hidden email]> wrote:
>
> Well that is interesting, I did not know that! Thanks Walter...
>
> https://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object
>
> I gave it a go in Python (what I'm using) to see what would happen, indeed
> it gives some odd behavior
>
> In [4]: jsonStr = ' {"test": 1, "test": 2} '
>
>
> In [5]: json.loads(jsonStr)
>
> Out[5]: {'test': 2}
>
> On Thu, Feb 6, 2020 at 11:49 AM Walter Underwood <[hidden email]>
> wrote:
>
>> Repeated keys are quite legal in JSON, but many libraries don’t support
>> that.
>>
>> It does look like that data layout could be redesigned to be more portable.
>>
>> wunder
>> Walter Underwood
>> [hidden email]
>> http://observer.wunderwood.org/  (my blog)
>>
>>> On Feb 6, 2020, at 8:38 AM, Doug Turnbull <
>> [hidden email]> wrote:
>>>
>>> Thanks for the tip,
>>>
>>> The issue is json.nl produces non-standard json with duplicate keys.
>> Solr
>>> generates the following, which json lint fails given multiple keys
>>>
>>> {
>>> "positions": {
>>> "position": 155,
>>> "position": 844,
>>> "position": 1726
>>> }
>>> }
>>>
>>> On Thu, Feb 6, 2020 at 11:36 AM Munendra S N <[hidden email]>
>>> wrote:
>>>
>>>>>
>>>>> Notice the lists, within lists, within lists. Where the keys are
>> adjacent
>>>>> items in the list. Is there a reason this isn't a JSON dictionary?
>>>>>
>>>> I think this is because of NamedList. Have you tried using json.nl=map
>> as
>>>> a
>>>> query parameter for this case?
>>>>
>>>> Regards,
>>>> Munendra S N
>>>>
>>>>
>>>>
>>>> On Thu, Feb 6, 2020 at 10:01 PM Doug Turnbull <
>>>> [hidden email]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I was curious if anyone had any tips on parsing the JSON response of
>> the
>>>>> term vectors component? Or anyway to force it to be more standard JSON?
>>>> It
>>>>> appears to be very heavily nested and idiosyncratic JSON, such as
>> below.
>>>>>
>>>>> Notice the lists, within lists, within lists. Where the keys are
>> adjacent
>>>>> items in the list. Is there a reason this isn't a JSON dictionary?
>>>> Instead
>>>>> you have to build a stateful list parser that just seems prone to
>>>> errors...
>>>>>
>>>>> Any thoughts or ideas are very welcome, I probably just need to do
>>>>> something rather simple here...
>>>>>
>>>>> "termVectors": [
>>>>> "D100000", [
>>>>> "uniqueKey", "D100000",
>>>>> "body", [
>>>>> "1", [
>>>>> "positions", [
>>>>> "position", 92,
>>>>> "position", 113
>>>>> ]
>>>>> ],
>>>>> "10", [ ...
>>>>>
>>>>> --
>>>>> *Doug Turnbull **| CTO* | OpenSource Connections
>>>>> <http://opensourceconnections.com>, LLC | 240.476.9983
>>>>> Author: Relevant Search <http://manning.com/turnbull>
>>>>> This e-mail and all contents, including attachments, is considered to
>> be
>>>>> Company Confidential unless explicitly stated otherwise, regardless
>>>>> of whether attachments are marked as such.
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Doug Turnbull **| CTO* | OpenSource Connections
>>> <http://opensourceconnections.com>, LLC | 240.476.9983
>>> Author: Relevant Search <http://manning.com/turnbull>
>>> This e-mail and all contents, including attachments, is considered to be
>>> Company Confidential unless explicitly stated otherwise, regardless
>>> of whether attachments are marked as such.
>>
>>
>
> --
> *Doug Turnbull **| CTO* | OpenSource Connections
> <http://opensourceconnections.com>, LLC | 240.476.9983
> Author: Relevant Search <http://manning.com/turnbull>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.

Reply | Threaded
Open this post in threaded view
|

Re: JSON from Term Vectors Component

Edward Ribeiro
In reply to this post by Doug Turnbull
Python's json lib will convert text as '{"id": 1, "id": 2}' to a dict, that
doesn't allow duplicate keys. The solution in this case is to inject your
own parsing logic as explained here:
https://stackoverflow.com/questions/29321677/python-json-parser-allow-duplicate-keys

One possible solution (below) is to turn the duplicate keys into key-list
pairs

from json import JSONDecoder

jsonStr = '{"positions": {"position": 155,"position": 844,"position":
1726}}'

def dict_treat_duplicates(ordered_pairs):
     d = {}
     for k,v in ordered_pairs:
         if k in d:
            # duplicate keys
            prev_v = d.get(k)
            if isinstance(prev_v, list):
                    # append to list
                    prev_v.append(v)
            else:
                    # turn into list
                    new_v = [prev_v, v]
                    d[k] = new_v
         else:
            d[k] = v
     return d
decoder = JSONDecoder(object_pairs_hook=dict_treat_duplicates)
decoder.decode(jsonStr)

will give you {'positions': {'position': [155, 844, 1726]}}, while

def dict_raise_on_duplicates(ordered_pairs):
      return ordered_pairs

will give you [('positions', [('position', 155), ('position', 844),
('position', 1726)])]

Best,
Edward

On Thu, Feb 6, 2020 at 1:57 PM Doug Turnbull <
[hidden email]> wrote:
>
> Well that is interesting, I did not know that! Thanks Walter...
>
>
https://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object

>
> I gave it a go in Python (what I'm using) to see what would happen, indeed
> it gives some odd behavior
>
> In [4]: jsonStr = ' {"test": 1, "test": 2} '
>
>
> In [5]: json.loads(jsonStr)
>
> Out[5]: {'test': 2}
>
> On Thu, Feb 6, 2020 at 11:49 AM Walter Underwood <[hidden email]>
> wrote:
>
> > Repeated keys are quite legal in JSON, but many libraries don’t support
> > that.
> >
> > It does look like that data layout could be redesigned to be more
portable.

> >
> > wunder
> > Walter Underwood
> > [hidden email]
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Feb 6, 2020, at 8:38 AM, Doug Turnbull <
> > [hidden email]> wrote:
> > >
> > > Thanks for the tip,
> > >
> > > The issue is json.nl produces non-standard json with duplicate keys.
> > Solr
> > > generates the following, which json lint fails given multiple keys
> > >
> > > {
> > > "positions": {
> > > "position": 155,
> > > "position": 844,
> > > "position": 1726
> > > }
> > > }
> > >
> > > On Thu, Feb 6, 2020 at 11:36 AM Munendra S N <[hidden email]>
> > > wrote:
> > >
> > >>>
> > >>> Notice the lists, within lists, within lists. Where the keys are
> > adjacent
> > >>> items in the list. Is there a reason this isn't a JSON dictionary?
> > >>>
> > >> I think this is because of NamedList. Have you tried using json.nl
=map

> > as
> > >> a
> > >> query parameter for this case?
> > >>
> > >> Regards,
> > >> Munendra S N
> > >>
> > >>
> > >>
> > >> On Thu, Feb 6, 2020 at 10:01 PM Doug Turnbull <
> > >> [hidden email]> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> I was curious if anyone had any tips on parsing the JSON response of
> > the
> > >>> term vectors component? Or anyway to force it to be more standard
JSON?

> > >> It
> > >>> appears to be very heavily nested and idiosyncratic JSON, such as
> > below.
> > >>>
> > >>> Notice the lists, within lists, within lists. Where the keys are
> > adjacent
> > >>> items in the list. Is there a reason this isn't a JSON dictionary?
> > >> Instead
> > >>> you have to build a stateful list parser that just seems prone to
> > >> errors...
> > >>>
> > >>> Any thoughts or ideas are very welcome, I probably just need to do
> > >>> something rather simple here...
> > >>>
> > >>> "termVectors": [
> > >>> "D100000", [
> > >>> "uniqueKey", "D100000",
> > >>> "body", [
> > >>> "1", [
> > >>> "positions", [
> > >>> "position", 92,
> > >>> "position", 113
> > >>> ]
> > >>> ],
> > >>> "10", [ ...
> > >>>
> > >>> --
> > >>> *Doug Turnbull **| CTO* | OpenSource Connections
> > >>> <http://opensourceconnections.com>, LLC | 240.476.9983
> > >>> Author: Relevant Search <http://manning.com/turnbull>
> > >>> This e-mail and all contents, including attachments, is considered
to

> > be
> > >>> Company Confidential unless explicitly stated otherwise, regardless
> > >>> of whether attachments are marked as such.
> > >>>
> > >>
> > >
> > >
> > > --
> > > *Doug Turnbull **| CTO* | OpenSource Connections
> > > <http://opensourceconnections.com>, LLC | 240.476.9983
> > > Author: Relevant Search <http://manning.com/turnbull>
> > > This e-mail and all contents, including attachments, is considered to
be

> > > Company Confidential unless explicitly stated otherwise, regardless
> > > of whether attachments are marked as such.
> >
> >
>
> --
> *Doug Turnbull **| CTO* | OpenSource Connections
> <http://opensourceconnections.com>, LLC | 240.476.9983
> Author: Relevant Search <http://manning.com/turnbull>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
Reply | Threaded
Open this post in threaded view
|

Re: JSON from Term Vectors Component

Doug Turnbull
FWIW, I ended up writing some code that does a best effort turning the
named list into a dict representation, if it can't, it'll keep it as a
python tuple.

def every_other_zipped(lst):
    return zip(lst[0::2],lst[1::2])

def dictify(nl_tups):
    """ Return dict if all keys unique, otherwise
        dont modify """
    as_dict = dict(nl_tups)
    if len(as_dict) == len(nl_tups):
        return as_dict
    return nl_tups

def parse_named_list(lst):
    shallow_tups = [tup for tup in every_other_zipped(lst)]

    nl_as_tups = []

    for tup in shallow_tups:
        if isinstance(tup[1], list):
            tup = (tup[0], parse_named_list(tup[1]))
        nl_as_tups.append(tup)
    return dictify(nl_as_tups)



if __name__ == "__main__":
    solr_nl =  [
"D100000", [
"uniqueKey", "D100000",
"body", [
"1", [
"positions", [
"position", 92,
"position", 113
],
"2", [
"positions", [
"position", 22,
"position", 413
]
]]]]]
    print(repr(parse_named_list(solr_nl)))



Outputs

{
'D100000': {
'uniqueKey': 'D100000',
'body': {
'1': {
'positions': [('position', 92), ('position', 113)]
},
'2': {
'positions': [('position', 22), ('position', 413)]
}
}
}
}


On Thu, Feb 6, 2020 at 12:59 PM Edward Ribeiro <[hidden email]>
wrote:

> Python's json lib will convert text as '{"id": 1, "id": 2}' to a dict, that
> doesn't allow duplicate keys. The solution in this case is to inject your
> own parsing logic as explained here:
>
> https://stackoverflow.com/questions/29321677/python-json-parser-allow-duplicate-keys
>
> One possible solution (below) is to turn the duplicate keys into key-list
> pairs
>
> from json import JSONDecoder
>
> jsonStr = '{"positions": {"position": 155,"position": 844,"position":
> 1726}}'
>
> def dict_treat_duplicates(ordered_pairs):
>      d = {}
>      for k,v in ordered_pairs:
>          if k in d:
>             # duplicate keys
>             prev_v = d.get(k)
>             if isinstance(prev_v, list):
>                     # append to list
>                     prev_v.append(v)
>             else:
>                     # turn into list
>                     new_v = [prev_v, v]
>                     d[k] = new_v
>          else:
>             d[k] = v
>      return d
> decoder = JSONDecoder(object_pairs_hook=dict_treat_duplicates)
> decoder.decode(jsonStr)
>
> will give you {'positions': {'position': [155, 844, 1726]}}, while
>
> def dict_raise_on_duplicates(ordered_pairs):
>       return ordered_pairs
>
> will give you [('positions', [('position', 155), ('position', 844),
> ('position', 1726)])]
>
> Best,
> Edward
>
> On Thu, Feb 6, 2020 at 1:57 PM Doug Turnbull <
> [hidden email]> wrote:
> >
> > Well that is interesting, I did not know that! Thanks Walter...
> >
> >
>
> https://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object
> >
> > I gave it a go in Python (what I'm using) to see what would happen,
> indeed
> > it gives some odd behavior
> >
> > In [4]: jsonStr = ' {"test": 1, "test": 2} '
> >
> >
> > In [5]: json.loads(jsonStr)
> >
> > Out[5]: {'test': 2}
> >
> > On Thu, Feb 6, 2020 at 11:49 AM Walter Underwood <[hidden email]>
> > wrote:
> >
> > > Repeated keys are quite legal in JSON, but many libraries don’t support
> > > that.
> > >
> > > It does look like that data layout could be redesigned to be more
> portable.
> > >
> > > wunder
> > > Walter Underwood
> > > [hidden email]
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > > > On Feb 6, 2020, at 8:38 AM, Doug Turnbull <
> > > [hidden email]> wrote:
> > > >
> > > > Thanks for the tip,
> > > >
> > > > The issue is json.nl produces non-standard json with duplicate keys.
> > > Solr
> > > > generates the following, which json lint fails given multiple keys
> > > >
> > > > {
> > > > "positions": {
> > > > "position": 155,
> > > > "position": 844,
> > > > "position": 1726
> > > > }
> > > > }
> > > >
> > > > On Thu, Feb 6, 2020 at 11:36 AM Munendra S N <
> [hidden email]>
> > > > wrote:
> > > >
> > > >>>
> > > >>> Notice the lists, within lists, within lists. Where the keys are
> > > adjacent
> > > >>> items in the list. Is there a reason this isn't a JSON dictionary?
> > > >>>
> > > >> I think this is because of NamedList. Have you tried using json.nl
> =map
> > > as
> > > >> a
> > > >> query parameter for this case?
> > > >>
> > > >> Regards,
> > > >> Munendra S N
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Feb 6, 2020 at 10:01 PM Doug Turnbull <
> > > >> [hidden email]> wrote:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>> I was curious if anyone had any tips on parsing the JSON response
> of
> > > the
> > > >>> term vectors component? Or anyway to force it to be more standard
> JSON?
> > > >> It
> > > >>> appears to be very heavily nested and idiosyncratic JSON, such as
> > > below.
> > > >>>
> > > >>> Notice the lists, within lists, within lists. Where the keys are
> > > adjacent
> > > >>> items in the list. Is there a reason this isn't a JSON dictionary?
> > > >> Instead
> > > >>> you have to build a stateful list parser that just seems prone to
> > > >> errors...
> > > >>>
> > > >>> Any thoughts or ideas are very welcome, I probably just need to do
> > > >>> something rather simple here...
> > > >>>
> > > >>> "termVectors": [
> > > >>> "D100000", [
> > > >>> "uniqueKey", "D100000",
> > > >>> "body", [
> > > >>> "1", [
> > > >>> "positions", [
> > > >>> "position", 92,
> > > >>> "position", 113
> > > >>> ]
> > > >>> ],
> > > >>> "10", [ ...
> > > >>>
> > > >>> --
> > > >>> *Doug Turnbull **| CTO* | OpenSource Connections
> > > >>> <http://opensourceconnections.com>, LLC | 240.476.9983
> > > >>> Author: Relevant Search <http://manning.com/turnbull>
> > > >>> This e-mail and all contents, including attachments, is considered
> to
> > > be
> > > >>> Company Confidential unless explicitly stated otherwise, regardless
> > > >>> of whether attachments are marked as such.
> > > >>>
> > > >>
> > > >
> > > >
> > > > --
> > > > *Doug Turnbull **| CTO* | OpenSource Connections
> > > > <http://opensourceconnections.com>, LLC | 240.476.9983
> > > > Author: Relevant Search <http://manning.com/turnbull>
> > > > This e-mail and all contents, including attachments, is considered to
> be
> > > > Company Confidential unless explicitly stated otherwise, regardless
> > > > of whether attachments are marked as such.
> > >
> > >
> >
> > --
> > *Doug Turnbull **| CTO* | OpenSource Connections
> > <http://opensourceconnections.com>, LLC | 240.476.9983
> > Author: Relevant Search <http://manning.com/turnbull>
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless
> > of whether attachments are marked as such.
>


--
*Doug Turnbull **| CTO* | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.