[Solr8.7] Chinese ZH language ?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Solr8.7] Chinese ZH language ?

Bruno Mannina
Hello,



I would like to index simplified chinese ZH language (i.e. 一种新型太阳能坪
床增温系统),

I added in my solrconfig the lib:

<lib dir="${solr.install.dir:../../..}/contrib/analysis-extras/lucene-libs/"
regex="lucene-analyzers-smartcn-8\.7\.0\.jar" />



First question: Is it enough ?



But now I need your help to define the fieldtype “text_zh” in my
schema.xml to use with:

(PS: As other fields, I need highlight)



<field name="tizh" type="text_zh" multiValued="true" indexed="true"
stored="true" termVectors="true" termPositions="true" termOffsets="true"/>



And



<!-- Simplified chinese -->

    <!-- BRUNO -->

    <fieldType name="text_zh" class="solr.TextField"
positionIncrementGap="100">

      <analyzer>

       <tokenizer class="solr.HMMChineseTokenizerFactory"/>

       <filter class="solr.CJKWidthFilterFactory"/>

       <filter class="solr.StopFilterFactory"

          words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>

       <filter class="solr.PorterStemFilterFactory"/>

       <filter class="solr.LowerCaseFilterFactory"/>

      </analyzer>

    </fieldType>



No error, when I reload my core.



But I can’t index Chinese data, I get this error:



POSTing file CN-0005.xml (application/xml) to [base]

SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url:
http://xxxx/solr/yyy/update

SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>

<response>



<lst name="responseHeader">

  <int name="status">400</int>

  <int name="QTime">1</int>

</lst>

<lst name="error">

  <lst name="metadata">

    <str name="error-class">org.apache.solr.common.SolrException</str>

    <str name="root-error-class">java.lang.IllegalArgumentException</str>

  </lst>

  <str name="msg">Exception writing document id CN112091782A to the index;
possible analysis error: cannot change field "tizh" from index options=DOCS
to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS</str>

  <int name="code">400</int>

</lst>

</response>

SimplePostTool: WARNING: IOException while reading response:
java.io.IOException: Server returned HTTP response code: 400 for URL:
http://xxxx/solr/yyy/update



Thanks a lot for your help,

Bruno





--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus
Reply | Threaded
Open this post in threaded view
|

Re: [Solr8.7] Chinese ZH language ?

Alexandre Rafalovitch
>possible analysis error: cannot change field "tizh" from

You have content indexed against old incompatible definition. Deleted but
not purged records count.

Delete your index data or change field name during testing.

Regards,
    Alex
On Sun., Jan. 10, 2021, 9:19 a.m. Bruno Mannina, <[hidden email]> wrote:

> Hello,
>
>
>
> I would like to index simplified chinese ZH language (i.e. 一种新型太阳能坪
> 床增温系统),
>
> I added in my solrconfig the lib:
>
> <lib
> dir="${solr.install.dir:../../..}/contrib/analysis-extras/lucene-libs/"
> regex="lucene-analyzers-smartcn-8\.7\.0\.jar" />
>
>
>
> First question: Is it enough ?
>
>
>
> But now I need your help to define the fieldtype “text_zh” in my
> schema.xml to use with:
>
> (PS: As other fields, I need highlight)
>
>
>
> <field name="tizh" type="text_zh" multiValued="true" indexed="true"
> stored="true" termVectors="true" termPositions="true" termOffsets="true"/>
>
>
>
> And
>
>
>
> <!-- Simplified chinese -->
>
>     <!-- BRUNO -->
>
>     <fieldType name="text_zh" class="solr.TextField"
> positionIncrementGap="100">
>
>       <analyzer>
>
>        <tokenizer class="solr.HMMChineseTokenizerFactory"/>
>
>        <filter class="solr.CJKWidthFilterFactory"/>
>
>        <filter class="solr.StopFilterFactory"
>
>           words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>
>        <filter class="solr.PorterStemFilterFactory"/>
>
>        <filter class="solr.LowerCaseFilterFactory"/>
>
>       </analyzer>
>
>     </fieldType>
>
>
>
> No error, when I reload my core.
>
>
>
> But I can’t index Chinese data, I get this error:
>
>
>
> POSTing file CN-0005.xml (application/xml) to [base]
>
> SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url:
> http://xxxx/solr/yyy/update
>
> SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
>
> <response>
>
>
>
> <lst name="responseHeader">
>
>   <int name="status">400</int>
>
>   <int name="QTime">1</int>
>
> </lst>
>
> <lst name="error">
>
>   <lst name="metadata">
>
>     <str name="error-class">org.apache.solr.common.SolrException</str>
>
>     <str name="root-error-class">java.lang.IllegalArgumentException</str>
>
>   </lst>
>
>   <str name="msg">Exception writing document id CN112091782A to the index;
> possible analysis error: cannot change field "tizh" from index options=DOCS
> to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS</str>
>
>   <int name="code">400</int>
>
> </lst>
>
> </response>
>
> SimplePostTool: WARNING: IOException while reading response:
> java.io.IOException: Server returned HTTP response code: 400 for URL:
> http://xxxx/solr/yyy/update
>
>
>
> Thanks a lot for your help,
>
> Bruno
>
>
>
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> https://www.avast.com/antivirus
>
Reply | Threaded
Open this post in threaded view
|

RE: [Solr8.7] Chinese ZH language ?

Bruno Mannina
Yes, it was that, I re-index and it works fine.

Thanks !

-----Message d'origine-----
De : Alexandre Rafalovitch [mailto:[hidden email]]
Envoyé : dimanche 10 janvier 2021 16:44
À : solr-user
Objet : Re: [Solr8.7] Chinese ZH language ?

>possible analysis error: cannot change field "tizh" from

You have content indexed against old incompatible definition. Deleted but not purged records count.

Delete your index data or change field name during testing.

Regards,
    Alex
On Sun., Jan. 10, 2021, 9:19 a.m. Bruno Mannina, <[hidden email]> wrote:

> Hello,
>
>
>
> I would like to index simplified chinese ZH language (i.e. 一种新型太阳能坪
> 床增温系统),
>
> I added in my solrconfig the lib:
>
> <lib
> dir="${solr.install.dir:../../..}/contrib/analysis-extras/lucene-libs/"
> regex="lucene-analyzers-smartcn-8\.7\.0\.jar" />
>
>
>
> First question: Is it enough ?
>
>
>
> But now I need your help to define the fieldtype “text_zh” in my
> schema.xml to use with:
>
> (PS: As other fields, I need highlight)
>
>
>
> <field name="tizh" type="text_zh" multiValued="true" indexed="true"
> stored="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
>
>
>
> And
>
>
>
> <!-- Simplified chinese -->
>
>     <!-- BRUNO -->
>
>     <fieldType name="text_zh" class="solr.TextField"
> positionIncrementGap="100">
>
>       <analyzer>
>
>        <tokenizer class="solr.HMMChineseTokenizerFactory"/>
>
>        <filter class="solr.CJKWidthFilterFactory"/>
>
>        <filter class="solr.StopFilterFactory"
>
>           words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>
>        <filter class="solr.PorterStemFilterFactory"/>
>
>        <filter class="solr.LowerCaseFilterFactory"/>
>
>       </analyzer>
>
>     </fieldType>
>
>
>
> No error, when I reload my core.
>
>
>
> But I can’t index Chinese data, I get this error:
>
>
>
> POSTing file CN-0005.xml (application/xml) to [base]
>
> SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url:
> http://xxxx/solr/yyy/update
>
> SimplePostTool: WARNING: Response: <?xml version="1.0"
> encoding="UTF-8"?>
>
> <response>
>
>
>
> <lst name="responseHeader">
>
>   <int name="status">400</int>
>
>   <int name="QTime">1</int>
>
> </lst>
>
> <lst name="error">
>
>   <lst name="metadata">
>
>     <str name="error-class">org.apache.solr.common.SolrException</str>
>
>     <str
> name="root-error-class">java.lang.IllegalArgumentException</str>
>
>   </lst>
>
>   <str name="msg">Exception writing document id CN112091782A to the
> index; possible analysis error: cannot change field "tizh" from index
> options=DOCS to inconsistent index
> options=DOCS_AND_FREQS_AND_POSITIONS</str>
>
>   <int name="code">400</int>
>
> </lst>
>
> </response>
>
> SimplePostTool: WARNING: IOException while reading response:
> java.io.IOException: Server returned HTTP response code: 400 for URL:
> http://xxxx/solr/yyy/update
>
>
>
> Thanks a lot for your help,
>
> Bruno
>
>
>
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> https://www.avast.com/antivirus
>


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus