TolerantUpdateProcessorFactory not functioning

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

TolerantUpdateProcessorFactory not functioning

Hup Chen
Hi,

My solr indexing did not tolerate bad record but simply exited even I have configured TolerantUpdateProcessorFactory  in solrconfig.xml.
Please advise how could I get TolerantUpdateProcessorFactory  to be working?

solrconfig.xml:

 <updateRequestProcessorChain name="tolerant-chain">
   <processor class="solr.TolerantUpdateProcessorFactory">
     <int name="maxErrors">100</int>
   </processor>
   <processor class="solr.RunUpdateProcessorFactory" />
 </updateRequestProcessorChain>

restarted solr before indexing:
service solr stop
service solr start

curl "http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100" -d @test.json

The first record is a bad record in test.json, the rest were not indexed.

{
  "responseHeader":{
    "errors":[{
        "type":"ADD",
        "id":"0007264097",
        "message":"ERROR: [doc=0007264097] Error adding field 'usedshipping'='' msg=empty String"}],
    "maxErrors":100,
    "status":400,
    "QTime":0},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Cannot parse provided JSON: Expected key,value separator ':': char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\", \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko OÃtomo\", ãã, \"ima'",
    "code":400}}

Reply | Threaded
Open this post in threaded view
|

Fw: TolerantUpdateProcessorFactory not functioning

Hup Chen
Any idea?
I still won't be able to get TolerantUpdateProcessorFactory working, solr exited at any error without any tolerance, any suggestions will be appreciated.
curl "http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100" -d @data.xml

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <arr name="errors"/>
  <int name="maxErrors">100</int>
  <int name="status">400</int>
  <int name="QTime">1</int>
</lst>
<lst name="error">
  <lst name="metadata">
    <str name="error-class">org.apache.solr.common.SolrException</str>
    <str name="root-error-class">com.ctc.wstx.exc.WstxEOFException</str>
  </lst>
  <str name="msg">Unexpected EOF; was expecting a close tag for element &lt;field&gt;
 at [row,col {unknown-source}]: [1,8191]</str>
  <int name="code">400</int>
</lst>
</response>


________________________________
From: Hup Chen
Sent: Friday, May 29, 2020 7:29 PM
To: [hidden email] <[hidden email]>
Subject: TolerantUpdateProcessorFactory not functioning

Hi,

My solr indexing did not tolerate bad record but simply exited even I have configured TolerantUpdateProcessorFactory  in solrconfig.xml.
Please advise how could I get TolerantUpdateProcessorFactory  to be working?

solrconfig.xml:

 <updateRequestProcessorChain name="tolerant-chain">
   <processor class="solr.TolerantUpdateProcessorFactory">
     <int name="maxErrors">100</int>
   </processor>
   <processor class="solr.RunUpdateProcessorFactory" />
 </updateRequestProcessorChain>

restarted solr before indexing:
service solr stop
service solr start

curl "http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100" -d @test.json

The first record is a bad record in test.json, the rest were not indexed.

{
  "responseHeader":{
    "errors":[{
        "type":"ADD",
        "id":"0007264097",
        "message":"ERROR: [doc=0007264097] Error adding field 'usedshipping'='' msg=empty String"}],
    "maxErrors":100,
    "status":400,
    "QTime":0},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Cannot parse provided JSON: Expected key,value separator ':': char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\", \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko OÃtomo\", ãã, \"ima'",
    "code":400}}

Reply | Threaded
Open this post in threaded view
|

Re: Fw: TolerantUpdateProcessorFactory not functioning

Thomas Corthals
If your XML or JSON can't be parsed, your content never makes it to the
update chain.

It looks like you're trying to index non-UTF-8 data. You can set the
encoding of your XML in the Content-Type header of your POST request.

-H 'Content-Type: text/xml; charset=GB18030'

JSON only allows UTF-8, UTF-16 or UTF-32.

Best,

Thomas

Op di 9 jun. 2020 07:11 schreef Hup Chen <[hidden email]>:

> Any idea?
> I still won't be able to get TolerantUpdateProcessorFactory working, solr
> exited at any error without any tolerance, any suggestions will be
> appreciated.
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100"
> -d @data.xml
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
>   <arr name="errors"/>
>   <int name="maxErrors">100</int>
>   <int name="status">400</int>
>   <int name="QTime">1</int>
> </lst>
> <lst name="error">
>   <lst name="metadata">
>     <str name="error-class">org.apache.solr.common.SolrException</str>
>     <str name="root-error-class">com.ctc.wstx.exc.WstxEOFException</str>
>   </lst>
>   <str name="msg">Unexpected EOF; was expecting a close tag for element
> &lt;field&gt;
>  at [row,col {unknown-source}]: [1,8191]</str>
>   <int name="code">400</int>
> </lst>
> </response>
>
>
> ________________________________
> From: Hup Chen
> Sent: Friday, May 29, 2020 7:29 PM
> To: [hidden email] <[hidden email]>
> Subject: TolerantUpdateProcessorFactory not functioning
>
> Hi,
>
> My solr indexing did not tolerate bad record but simply exited even I have
> configured TolerantUpdateProcessorFactory  in solrconfig.xml.
> Please advise how could I get TolerantUpdateProcessorFactory  to be
> working?
>
> solrconfig.xml:
>
>  <updateRequestProcessorChain name="tolerant-chain">
>    <processor class="solr.TolerantUpdateProcessorFactory">
>      <int name="maxErrors">100</int>
>    </processor>
>    <processor class="solr.RunUpdateProcessorFactory" />
>  </updateRequestProcessorChain>
>
> restarted solr before indexing:
> service solr stop
> service solr start
>
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100"
> -d @test.json
>
> The first record is a bad record in test.json, the rest were not indexed.
>
> {
>   "responseHeader":{
>     "errors":[{
>         "type":"ADD",
>         "id":"0007264097",
>         "message":"ERROR: [doc=0007264097] Error adding field
> 'usedshipping'='' msg=empty String"}],
>     "maxErrors":100,
>     "status":400,
>     "QTime":0},
>   "error":{
>     "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","org.apache.solr.common.SolrException"],
>     "msg":"Cannot parse provided JSON: Expected key,value separator ':':
> char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\",
> \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko
> OÃtomo\", ãã, \"ima'",
>     "code":400}}
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Fw: TolerantUpdateProcessorFactory not functioning

Hup Chen
Thanks for your reply, this is one of the example where it fail.  POST by using  charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in the title field,  I hope solr can simply skip this record and go ahead to index the rest data.

<add>
<doc>
 <field name="id">9780373773244</field>
 <field name="isbn13">9780373773244</field>
<field name="title">Missing: Innocent By Association^Zachary's Law (Hqn Romance) </field>
 <field name="author">Lisa_Jackson </field>
</doc>
</add>



curl "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100" -H 'Content-Type: text/xml; charset=utf-8' -d @data


<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <arr name="errors"/>
  <int name="maxErrors">100</int>
  <int name="status">400</int>
  <int name="QTime">0</int>
</lst>
<lst name="error">
  <lst name="metadata">
    <str name="error-class">org.apache.solr.common.SolrException</str>
    <str name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException</str>
  </lst>
  <str name="msg">Illegal character ((CTRL-CHAR, code 26))
 at [row,col {unknown-source}]: [1,225]</str>
  <int name="code">400</int>
</lst>
</response>

________________________________
From: Thomas Corthals <[hidden email]>
Sent: Tuesday, June 9, 2020 2:12 PM
To: [hidden email] <[hidden email]>
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning

If your XML or JSON can't be parsed, your content never makes it to the
update chain.

It looks like you're trying to index non-UTF-8 data. You can set the
encoding of your XML in the Content-Type header of your POST request.

-H 'Content-Type: text/xml; charset=GB18030'

JSON only allows UTF-8, UTF-16 or UTF-32.

Best,

Thomas

Op di 9 jun. 2020 07:11 schreef Hup Chen <[hidden email]>:

> Any idea?
> I still won't be able to get TolerantUpdateProcessorFactory working, solr
> exited at any error without any tolerance, any suggestions will be
> appreciated.
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100"
> -d @data.xml
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
>   <arr name="errors"/>
>   <int name="maxErrors">100</int>
>   <int name="status">400</int>
>   <int name="QTime">1</int>
> </lst>
> <lst name="error">
>   <lst name="metadata">
>     <str name="error-class">org.apache.solr.common.SolrException</str>
>     <str name="root-error-class">com.ctc.wstx.exc.WstxEOFException</str>
>   </lst>
>   <str name="msg">Unexpected EOF; was expecting a close tag for element
> &lt;field&gt;
>  at [row,col {unknown-source}]: [1,8191]</str>
>   <int name="code">400</int>
> </lst>
> </response>
>
>
> ________________________________
> From: Hup Chen
> Sent: Friday, May 29, 2020 7:29 PM
> To: [hidden email] <[hidden email]>
> Subject: TolerantUpdateProcessorFactory not functioning
>
> Hi,
>
> My solr indexing did not tolerate bad record but simply exited even I have
> configured TolerantUpdateProcessorFactory  in solrconfig.xml.
> Please advise how could I get TolerantUpdateProcessorFactory  to be
> working?
>
> solrconfig.xml:
>
>  <updateRequestProcessorChain name="tolerant-chain">
>    <processor class="solr.TolerantUpdateProcessorFactory">
>      <int name="maxErrors">100</int>
>    </processor>
>    <processor class="solr.RunUpdateProcessorFactory" />
>  </updateRequestProcessorChain>
>
> restarted solr before indexing:
> service solr stop
> service solr start
>
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100"
> -d @test.json
>
> The first record is a bad record in test.json, the rest were not indexed.
>
> {
>   "responseHeader":{
>     "errors":[{
>         "type":"ADD",
>         "id":"0007264097",
>         "message":"ERROR: [doc=0007264097] Error adding field
> 'usedshipping'='' msg=empty String"}],
>     "maxErrors":100,
>     "status":400,
>     "QTime":0},
>   "error":{
>     "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","org.apache.solr.common.SolrException"],
>     "msg":"Cannot parse provided JSON: Expected key,value separator ':':
> char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\",
> \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko
> OÃtomo\", ãã, \"ima'",
>     "code":400}}
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Fw: TolerantUpdateProcessorFactory not functioning

Shawn Heisey-2
On 6/9/2020 12:44 AM, Hup Chen wrote:

> Thanks for your reply, this is one of the example where it fail.  POST by using  charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in the title field,  I hope solr can simply skip this record and go ahead to index the rest data.
>
> <add>
> <doc>
>   <field name="id">9780373773244</field>
>   <field name="isbn13">9780373773244</field>
> <field name="title">Missing: Innocent By Association^Zachary's Law (Hqn Romance) </field>
>   <field name="author">Lisa_Jackson </field>
> </doc>
> </add>
>
> curl "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100" -H 'Content-Type: text/xml; charset=utf-8' -d @data
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
>    <arr name="errors"/>
>    <int name="maxErrors">100</int>
>    <int name="status">400</int>
>    <int name="QTime">0</int>
> </lst>
> <lst name="error">
>    <lst name="metadata">
>      <str name="error-class">org.apache.solr.common.SolrException</str>
>      <str name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException</str>
>    </lst>
>    <str name="msg">Illegal character ((CTRL-CHAR, code 26))
>   at [row,col {unknown-source}]: [1,225]</str>
>    <int name="code">400</int>
> </lst>
> </response>

I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble.  I wasn't even
using the tolerant update processor.   I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:

curl "http://localhost:8983/solr/techproducts/update" -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character.  When I did that, I got exactly the same
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*.  This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Fw: TolerantUpdateProcessorFactory not functioning

Hup Chen
Oh I got it, that's not indexing error!
Seem like I need to remove all the characters between [\x0-\x1F] (except \x9 TAB, \xA LF, \xD CR) first.

Thanks a lot!




________________________________
From: Shawn Heisey <[hidden email]>
Sent: Tuesday, June 9, 2020 3:19 PM
To: [hidden email] <[hidden email]>
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning


I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble.  I wasn't even
using the tolerant update processor.   I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:

curl "http://localhost:8983/solr/techproducts/update" -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character.  When I did that, I got exactly the same
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*.  This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Fw: TolerantUpdateProcessorFactory not functioning

Hup Chen
In reply to this post by Shawn Heisey-2

There was another error which I think it should be an indexing error.
The listprice below is a pdouble filed, the update process didn't ignore the error when it was sent wrong data.

Response: {
  "responseHeader":{
    "status":400,
    "QTime":133551},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","java.lang.NumberFormatException"],
    "msg":"ERROR: [doc=978194537913] Error adding field 'listprice'='106Chapter' msg=For input string: \"106Chapter\"",
    "code":400}}


________________________________
From: Shawn Heisey <[hidden email]>
Sent: Tuesday, June 9, 2020 3:19 PM
To: [hidden email] <[hidden email]>
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning

On 6/9/2020 12:44 AM, Hup Chen wrote:

> Thanks for your reply, this is one of the example where it fail.  POST by using  charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in the title field,  I hope solr can simply skip this record and go ahead to index the rest data.
>
> <add>
> <doc>
>   <field name="id">9780373773244</field>
>   <field name="isbn13">9780373773244</field>
> <field name="title">Missing: Innocent By Association^Zachary's Law (Hqn Romance) </field>
>   <field name="author">Lisa_Jackson </field>
> </doc>
> </add>
>
> curl "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100" -H 'Content-Type: text/xml; charset=utf-8' -d @data
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
>    <arr name="errors"/>
>    <int name="maxErrors">100</int>
>    <int name="status">400</int>
>    <int name="QTime">0</int>
> </lst>
> <lst name="error">
>    <lst name="metadata">
>      <str name="error-class">org.apache.solr.common.SolrException</str>
>      <str name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException</str>
>    </lst>
>    <str name="msg">Illegal character ((CTRL-CHAR, code 26))
>   at [row,col {unknown-source}]: [1,225]</str>
>    <int name="code">400</int>
> </lst>
> </response>

I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble.  I wasn't even
using the tolerant update processor.   I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:

curl "http://localhost:8983/solr/techproducts/update" -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character.  When I did that, I got exactly the same
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*.  This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.

Thanks,
Shawn