SOLR Text Field

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

SOLR Text Field

Dave Beckstrom
Hi Everyone,

I'm really hating SOLR.   All I want is to define a text field that data
can be indexed into and which is searchable.  Should be super simple.  But
I run into issue after issue.  I'm running SOLR 7.3 because it's compatible
with the version of NUTCH I'm running.

The docs say that SOLR ships with a default TextField but that seems to be
wrong.  I define:


<field name="metadata.myfield" type="TextField" stored="true"
indexed="true"/>

The above throws error  "Unable to create core [MyCore] Caused by: Unknown
fieldType 'TextField' specified on field metadata.myfield"

Then I try:

<field name="metadata.myfield" type="Text" stored="true" indexed="true"/>

Same error.

Then as a workaround I got into defining a "Text_General" field because I
couldn't get Text to work.  Text_General extends the Text field which seems
to indicate there should be a text field built into SOLR!!!!!

Text_General causes a new set of problems.   How does one go about using
the supposed default text field available in SOLR?

When I defined Text_General:

 <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory"
name="add-schema-fields">
    <lst name="typeMapping">
      <str name="valueClass">java.lang.String</str>
      <str name="fieldType">text_general</str>
      <bool name="default">true</bool>
    </lst>

Text_General with type=string complains when I try and insert data that has
characters and numbers:

java.lang.Exception:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at <a href="http://127.0.0.1:xxxx/solr/MyCore:">http://127.0.0.1:xxxx/solr/MyCore: ERROR: [doc=
http://xxx.xxx.com/services/mydocument.htm] Error adding field
'metatag.myfield'='15c0188' msg=For input string: "15c0188"

I'm very frustrated.  If anyone is able to help sort this out I would
really appreciate it!  What do I need to do to be able to define a simple
text field that is stored and searchable?

Thank you!

--
*Fig Leaf Software, Inc.* 
https://www.figleaf.com/ 
<https://www.figleaf.com/>  

Full-Service Solutions Integrator






Reply | Threaded
Open this post in threaded view
|

Re: SOLR Text Field

Tim Allison
TextField is a classname. Look in managedschema and pick a field type by
name, e.g. text_general

On Sat, Apr 6, 2019 at 9:00 AM Dave Beckstrom <[hidden email]>
wrote:

> Hi Everyone,
>
> I'm really hating SOLR.   All I want is to define a text field that data
> can be indexed into and which is searchable.  Should be super simple.  But
> I run into issue after issue.  I'm running SOLR 7.3 because it's compatible
> with the version of NUTCH I'm running.
>
> The docs say that SOLR ships with a default TextField but that seems to be
> wrong.  I define:
>
>
> <field name="metadata.myfield" type="TextField" stored="true"
> indexed="true"/>
>
> The above throws error  "Unable to create core [MyCore] Caused by: Unknown
> fieldType 'TextField' specified on field metadata.myfield"
>
> Then I try:
>
> <field name="metadata.myfield" type="Text" stored="true" indexed="true"/>
>
> Same error.
>
> Then as a workaround I got into defining a "Text_General" field because I
> couldn't get Text to work.  Text_General extends the Text field which seems
> to indicate there should be a text field built into SOLR!!!!!
>
> Text_General causes a new set of problems.   How does one go about using
> the supposed default text field available in SOLR?
>
> When I defined Text_General:
>
>  <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory"
> name="add-schema-fields">
>     <lst name="typeMapping">
>       <str name="valueClass">java.lang.String</str>
>       <str name="fieldType">text_general</str>
>       <bool name="default">true</bool>
>     </lst>
>
> Text_General with type=string complains when I try and insert data that has
> characters and numbers:
>
> java.lang.Exception:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at <a href="http://127.0.0.1:xxxx/solr/MyCore:">http://127.0.0.1:xxxx/solr/MyCore: ERROR: [doc=
> http://xxx.xxx.com/services/mydocument.htm] Error adding field
> 'metatag.myfield'='15c0188' msg=For input string: "15c0188"
>
> I'm very frustrated.  If anyone is able to help sort this out I would
> really appreciate it!  What do I need to do to be able to define a simple
> text field that is stored and searchable?
>
> Thank you!
>
> --
> *Fig Leaf Software, Inc.*
> https://www.figleaf.com/
> <https://www.figleaf.com/>
>
> Full-Service Solutions Integrator
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: SOLR Text Field

David Hastings
In reply to this post by Dave Beckstrom
Wow. Ok dude relax and take a nap. It sounds like you don’t even have a core defined. Maybe you’d do and I’m reaching a bit but start there solr is super simple and only gets complicated when you’re complicated.

> On Apr 6, 2019, at 8:59 AM, Dave Beckstrom <[hidden email]> wrote:
>
> Hi Everyone,
>
> I'm really hating SOLR.   All I want is to define a text field that data
> can be indexed into and which is searchable.  Should be super simple.  But
> I run into issue after issue.  I'm running SOLR 7.3 because it's compatible
> with the version of NUTCH I'm running.
>
> The docs say that SOLR ships with a default TextField but that seems to be
> wrong.  I define:
>
>
> <field name="metadata.myfield" type="TextField" stored="true"
> indexed="true"/>
>
> The above throws error  "Unable to create core [MyCore] Caused by: Unknown
> fieldType 'TextField' specified on field metadata.myfield"
>
> Then I try:
>
> <field name="metadata.myfield" type="Text" stored="true" indexed="true"/>
>
> Same error.
>
> Then as a workaround I got into defining a "Text_General" field because I
> couldn't get Text to work.  Text_General extends the Text field which seems
> to indicate there should be a text field built into SOLR!!!!!
>
> Text_General causes a new set of problems.   How does one go about using
> the supposed default text field available in SOLR?
>
> When I defined Text_General:
>
> <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory"
> name="add-schema-fields">
>    <lst name="typeMapping">
>      <str name="valueClass">java.lang.String</str>
>      <str name="fieldType">text_general</str>
>      <bool name="default">true</bool>
>    </lst>
>
> Text_General with type=string complains when I try and insert data that has
> characters and numbers:
>
> java.lang.Exception:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at <a href="http://127.0.0.1:xxxx/solr/MyCore:">http://127.0.0.1:xxxx/solr/MyCore: ERROR: [doc=
> http://xxx.xxx.com/services/mydocument.htm] Error adding field
> 'metatag.myfield'='15c0188' msg=For input string: "15c0188"
>
> I'm very frustrated.  If anyone is able to help sort this out I would
> really appreciate it!  What do I need to do to be able to define a simple
> text field that is stored and searchable?
>
> Thank you!
>
> --
> *Fig Leaf Software, Inc.*
> https://www.figleaf.com/ 
> <https://www.figleaf.com/>  
>
> Full-Service Solutions Integrator
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: SOLR Text Field

Shawn Heisey-2
In reply to this post by Dave Beckstrom
On 4/6/2019 6:59 AM, Dave Beckstrom wrote:

> I'm really hating SOLR.   All I want is to define a text field that data
> can be indexed into and which is searchable.  Should be super simple.  But
> I run into issue after issue.  I'm running SOLR 7.3 because it's compatible
> with the version of NUTCH I'm running.
>
> The docs say that SOLR ships with a default TextField but that seems to be
> wrong.  I define:
>
> <field name="metadata.myfield" type="TextField" stored="true"
> indexed="true"/>

That is a field definition.  In order for that to work, you must also
have a type definition named "TextField".

There are no default type definitions, your schema must contain all of
the types that you use.

Solr does include a field class named TextField, which can be used in a
type definition.

I'm going to paste a full (and quite short) schema below for a dovecot
index that I have been experimenting on.  Dovecot is a POP3/IMAP server,
for email, and it can use Solr as a search backend.  This schema defines
four types - string, long, boolean, and text.

You'll notice that the definition for "text" uses "solr.TextField" for
the class.  The fully qualified class name for this is actually
org.apache.solr.schema.TextField if you want to find it in the source
code.  The "solr." prefix on the class name is special syntax that Solr
uses to search multiple java packages.

----------------
<?xml version="1.0" encoding="UTF-8" ?>

<!--
For fts-solr:

This is the Solr schema file, place it into solr/conf/schema.xml. You may
want to modify the tokenizers and filters.
-->
<schema name="dovecot" version="1.5">
   <types>
     <!-- IMAP has 32bit unsigned ints but java ints are signed, so use
longs -->
     <fieldType name="string" class="solr.StrField" />
     <fieldType name="long" class="solr.LongPointField" />
     <fieldType name="boolean" class="solr.BoolField" />

     <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
         <filter class="solr.FlattenGraphFilterFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.EnglishPossessiveFilterFactory"/>
         <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
         <filter class="solr.EnglishMinimalStemFilterFactory"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.EnglishPossessiveFilterFactory"/>
         <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
         <filter class="solr.EnglishMinimalStemFilterFactory"/>
       </analyzer>
     </fieldType>
  </types>


  <fields>
    <field name="id" type="string" indexed="true" stored="true"
required="true" />
    <field name="uid" type="long" indexed="true" stored="true"
required="true" />
    <field name="box" type="string" indexed="true" stored="true"
required="true" />
    <field name="user" type="string" indexed="true" stored="true"
required="true" />

    <field name="hdr" type="text" indexed="true" stored="false" />
    <field name="body" type="text" indexed="true" stored="false" />

    <field name="from" type="text" indexed="true" stored="false" />
    <field name="to" type="text" indexed="true" stored="false" />
    <field name="cc" type="text" indexed="true" stored="false" />
    <field name="bcc" type="text" indexed="true" stored="false" />
    <field name="subject" type="text" indexed="true" stored="false" />

    <!-- Used by Solr internally: -->
    <field name="_version_" type="long" indexed="true" stored="true"/>
  </fields>

  <uniqueKey>id</uniqueKey>
</schema>
----------------

> When I defined Text_General:
>
>   <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory"
> name="add-schema-fields">
>      <lst name="typeMapping">
>        <str name="valueClass">java.lang.String</str>
>        <str name="fieldType">text_general</str>
>        <bool name="default">true</bool>
>      </lst>

That is an update processor definition, not a type definition.  That
definition and its usage would both go in solrconfig.xml, not your
schema.  Update processors are not relevant to the message you posted.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: SOLR Text Field

Dave Beckstrom
In reply to this post by Dave Beckstrom
Shawn,

I can't thank you enough for taking the time to reply to my question and
for the info you shared.

I don't believe I ever found one example by Googling of how to define a
simple text field in SOLR.  I saw some examples of Text_General but as you
saw it wasn't what I needed.

Based on the info you provided I was able to get it working where Nutch now
crawls and indexes into SOLR without issue.    I rebuilt all my collections
with the proper field definitions using your example.

SOLR really should ship with a sample text field defined even if commented
out and only for example purposes only.  That would have been
most helpful.  Even a FAQ somewhere would have been helpful.

Anyway, you're the best and thank you again!!!!

Best,

Dave Beckstrom

--
*Fig Leaf Software, Inc.* 
https://www.figleaf.com/ 
<https://www.figleaf.com/>  

Full-Service Solutions Integrator






Reply | Threaded
Open this post in threaded view
|

Re: SOLR Text Field

Shawn Heisey-2
On 4/8/2019 10:27 AM, Dave Beckstrom wrote:
> SOLR really should ship with a sample text field defined even if commented
> out and only for example purposes only.  That would have been
> most helpful.  Even a FAQ somewhere would have been helpful.

There are two example configs in the latest version of Solr (8.0.0).
Some of the earlier versions include more than two.

In the latest download, check the solr-8.0.0/server/solr/configsets
directory.  There will be two directories there, each of which contains
a conf directory.

In the managed-schema file found in the conf directory, you will find
multiple examples of text field types.  The managed-schema in the
_default configset has the following type names that use the
solr.TextField class:

text_ws, text_general, text_en, text_en_splitting,
text_splitting_en_tight, text_general_rev, phonetic_en, lowercase,
descendent_path, ancestor_path, delimited_payloads_float,
delimited_payloads_int, delimited_payloads_string, text_ar, text_bg,
text_ca, text_cjk, text_cz, text_da, text_de, text_el, text_es, text_eu,
text_fa, text_fi, text_fr, text_ga, text_gl, text_hi, text_hu, text_hy,
text_id, text_it, text_ja, text_ko, text_lv, text_nl, text_no, text_pt,
text_ro, text_ru, text_sv, text_th, text_tr

There are also field definitions using most of the fieldType definitions
in the example config.

Solr's example configs fall into the "kitchen sink" category.  They
contain things that most users will NEVER need.

Thanks,
Shawn