[Solr8.7] Indexing only some language ?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Solr8.7] Indexing only some language ?

Bruno Mannina
Hello,



I would like to define in my schema.xml some text_xx fields.

I have patent titles in several languages.

Only 6 of them (EN, IT, FR, PT, ES, DE) interest me.



I know how to define these 6 fields, I use text_en, text_it etc.



i.e. for English language:

<field name="tien" type="text_en" multiValued="false" indexed="true"
stored="true" termVectors="true" termPositions="true" termOffsets="true"/>



But I have more than 6 languages like: AR, CN, JP, KR etc.

I can't analyze all source files to detect all languages and define them in
my schema.



I would like to use a dynamic field to index other languages.

<dynamicField name="ti*" type="text_general" multiValued="false"
indexed="true" stored="true" omitTermFreqAndPositions="true"
omitNorms="true"/>



Is it ok to do that?

Is TIEN field will be indexed twice internally or as tien is already defined
ti* will not process tien?



Thanks for your kind reply,



Sincerely

Bruno









--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus
Reply | Threaded
Open this post in threaded view
|

Re:[Solr8.7] Indexing only some language ?

xiefengchang
Take a look at the document here: https://lucene.apache.org/solr/guide/8_7/dynamic-fields.html#dynamic-fields


here's the point: "a field that does not match any explicitly defined fields can be matched with a dynamic field."


so I guess the priority is quite clear~

















At 2021-01-10 03:38:01, "Bruno Mannina" <[hidden email]> wrote:

>Hello,
>
>
>
>I would like to define in my schema.xml some text_xx fields.
>
>I have patent titles in several languages.
>
>Only 6 of them (EN, IT, FR, PT, ES, DE) interest me.
>
>
>
>I know how to define these 6 fields, I use text_en, text_it etc.
>
>
>
>i.e. for English language:
>
><field name="tien" type="text_en" multiValued="false" indexed="true"
>stored="true" termVectors="true" termPositions="true" termOffsets="true"/>
>
>
>
>But I have more than 6 languages like: AR, CN, JP, KR etc.
>
>I can't analyze all source files to detect all languages and define them in
>my schema.
>
>
>
>I would like to use a dynamic field to index other languages.
>
><dynamicField name="ti*" type="text_general" multiValued="false"
>indexed="true" stored="true" omitTermFreqAndPositions="true"
>omitNorms="true"/>
>
>
>
>Is it ok to do that?
>
>Is TIEN field will be indexed twice internally or as tien is already defined
>ti* will not process tien?
>
>
>
>Thanks for your kind reply,
>
>
>
>Sincerely
>
>Bruno
>
>
>
>
>
>
>
>
>
>--
>L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
>https://www.avast.com/antivirus
Reply | Threaded
Open this post in threaded view
|

RE: [Solr8.7] Indexing only some language ?

Bruno Mannina
PErfect ! Thanks !

-----Message d'origine-----
De : xiefengchang [mailto:[hidden email]]
Envoyé : dimanche 10 janvier 2021 04:50
À : [hidden email]
Objet : Re:[Solr8.7] Indexing only some language ?

Take a look at the document here:
https://lucene.apache.org/solr/guide/8_7/dynamic-fields.html#dynamic-fields


here's the point: "a field that does not match any explicitly defined fields
can be matched with a dynamic field."


so I guess the priority is quite clear~

















At 2021-01-10 03:38:01, "Bruno Mannina" <[hidden email]> wrote:

>Hello,
>
>
>
>I would like to define in my schema.xml some text_xx fields.
>
>I have patent titles in several languages.
>
>Only 6 of them (EN, IT, FR, PT, ES, DE) interest me.
>
>
>
>I know how to define these 6 fields, I use text_en, text_it etc.
>
>
>
>i.e. for English language:
>
><field name="tien" type="text_en" multiValued="false" indexed="true"
>stored="true" termVectors="true" termPositions="true"
>termOffsets="true"/>
>
>
>
>But I have more than 6 languages like: AR, CN, JP, KR etc.
>
>I can't analyze all source files to detect all languages and define
>them in my schema.
>
>
>
>I would like to use a dynamic field to index other languages.
>
><dynamicField name="ti*" type="text_general" multiValued="false"
>indexed="true" stored="true" omitTermFreqAndPositions="true"
>omitNorms="true"/>
>
>
>
>Is it ok to do that?
>
>Is TIEN field will be indexed twice internally or as tien is already
>defined
>ti* will not process tien?
>
>
>
>Thanks for your kind reply,
>
>
>
>Sincerely
>
>Bruno
>
>
>
>
>
>
>
>
>
>--
>L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
>https://www.avast.com/antivirus


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus