Changing Field Assignments

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Changing Field Assignments

Terry Steichen
I am using Solr (6.6.0) in the automatic mode (where it discovers
fields).  It's working fine with one exception.  The problem is that
Solr maps the discovered "meta_creation_date" is assigned the type
TrieDateField. 

Unfortunately, that type is limited in a number of ways (like sorting,
abbreviated forms and etc.).  What I'd like to do is have that
("meta_creation_date") field assigned to a different type, like
DateRangeField. 

Is it possible to accomplish this (during indexing) by creating a copy
field to a different type, and using the copy field in the query?  Or
via some kind of function operation (which I've never understood)?

Reply | Threaded
Open this post in threaded view
|

Re: Changing Field Assignments

Yasufumi Mizoguchi
Hi,

You can do that via adding the following lines in managed-schema.

  <dynamicField name="*_range" type="date_range" indexed="true"
stored="true"/>
  <fieldType name="date_range" class="solr.DateRangeField"/>
  <copyField source="*_date" dest="*_date_range"/>

After adding the above and re-indexing docs, you will get the result like
following.

{ "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"*:*", "indent":
"on", "wt":"json", "_":"1528772599296"}}, "response":{"numFound":2,"start":0
,"docs":[ { "id":"test2", "meta_creation_date":["2018-04-30T00:00:00Z"], "
meta_creation_date_range":"2018-04-30T00:00:00Z", "_version_":
1603034044781559808}, { "id":"test", "meta_creation_date":[
"1944-04-02T00:00:00Z"], "meta_creation_date_range":"1944-04-02T00:00:00Z",
"_version_":1603034283921899520}] }}

thanks,
Yasufumi


2018年6月12日(火) 5:04 Terry Steichen <[hidden email]>:

> I am using Solr (6.6.0) in the automatic mode (where it discovers
> fields).  It's working fine with one exception.  The problem is that
> Solr maps the discovered "meta_creation_date" is assigned the type
> TrieDateField.
>
> Unfortunately, that type is limited in a number of ways (like sorting,
> abbreviated forms and etc.).  What I'd like to do is have that
> ("meta_creation_date") field assigned to a different type, like
> DateRangeField.
>
> Is it possible to accomplish this (during indexing) by creating a copy
> field to a different type, and using the copy field in the query?  Or
> via some kind of function operation (which I've never understood)?
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Changing Field Assignments

Alessandro Benedetti
On top of that I would not recommend to use the schema-less mode in
production.
That mode is useful for experimenting and prototyping, but with a managed
schema you would have much more control over a production instance.

Regards



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Changing Field Assignments

Shawn Heisey-2
In reply to this post by Terry Steichen
On 6/11/2018 2:02 PM, Terry Steichen wrote:

> I am using Solr (6.6.0) in the automatic mode (where it discovers
> fields).  It's working fine with one exception.  The problem is that
> Solr maps the discovered "meta_creation_date" is assigned the type
> TrieDateField. 
>
> Unfortunately, that type is limited in a number of ways (like sorting,
> abbreviated forms and etc.).  What I'd like to do is have that
> ("meta_creation_date") field assigned to a different type, like
> DateRangeField. 
>
> Is it possible to accomplish this (during indexing) by creating a copy
> field to a different type, and using the copy field in the query?  Or
> via some kind of function operation (which I've never understood)?

What you are describing is precisely why I never use the mode where Solr
automatically adds unknown fields.

If the field does not exist in the schema before you index the document,
then the best Solr can do is precisely what is configured in the update
processor that adds unknown fields.  You can adjust that config, but it
will always be a general purpose guess.

What is actually needed for multiple unknown fields is often outside
what that update processor is capable of detecting and configuring
automatically.  For that reason, I set up the schema manually, and I
want indexing to fail if the input documents contain fields that I
haven't defined.  Then whoever is doing the indexing can contact me with
their error details, and I can add new fields with the exact required
definition.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Changing Field Assignments

Terry Steichen
Shawn,

I don't disagree at all, but have a basic question: How do you easily
transition from a system using a dynamic schema to one using a fixed one?

I'm runnning 6.6.0 in cloud mode (only because it's necessary, as I
understand it, to be in cloud mode for the authentication/authorization
to work).  In my server/solr/configsets subdirectory there are
directories "data_driven_schema_configs" and "basic_configs".  Both
contain a file named "managed_schema."  Which one is the active one?

From the AdminUI, each collection has an associated "managed_schema"
(under the "Files" option).  I'm guessing that this collection-specific
managed_schema is the result of the automated field discovery process,
presumably using some baseline version (in configsets) to start with.

If that's true, then it would presumably make sense to save this
collection-specific managed_schema to disk as schema.xml.  I further
presume I'd create a config subdirectory for each of said collections
and put schema.xml there.  Is that right?

And I have to do this for each collection, right?

Every time I read (and reread, and reread, ...) the Solr docs they seem
to be making certain (very basic) assumptions that I'm unclear about, so
your help in the preceding would be most appreciated.

Thanks.

Terry


On 06/14/2018 01:51 PM, Shawn Heisey wrote:

> On 6/11/2018 2:02 PM, Terry Steichen wrote:
>> I am using Solr (6.6.0) in the automatic mode (where it discovers
>> fields).  It's working fine with one exception.  The problem is that
>> Solr maps the discovered "meta_creation_date" is assigned the type
>> TrieDateField. 
>>
>> Unfortunately, that type is limited in a number of ways (like sorting,
>> abbreviated forms and etc.).  What I'd like to do is have that
>> ("meta_creation_date") field assigned to a different type, like
>> DateRangeField. 
>>
>> Is it possible to accomplish this (during indexing) by creating a copy
>> field to a different type, and using the copy field in the query?  Or
>> via some kind of function operation (which I've never understood)?
> What you are describing is precisely why I never use the mode where Solr
> automatically adds unknown fields.
>
> If the field does not exist in the schema before you index the document,
> then the best Solr can do is precisely what is configured in the update
> processor that adds unknown fields.  You can adjust that config, but it
> will always be a general purpose guess.
>
> What is actually needed for multiple unknown fields is often outside
> what that update processor is capable of detecting and configuring
> automatically.  For that reason, I set up the schema manually, and I
> want indexing to fail if the input documents contain fields that I
> haven't defined.  Then whoever is doing the indexing can contact me with
> their error details, and I can add new fields with the exact required
> definition.
>
> Thanks,
> Shawn
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Changing Field Assignments

Shawn Heisey-2
On 6/14/2018 12:10 PM, Terry Steichen wrote:
> I don't disagree at all, but have a basic question: How do you easily
> transition from a system using a dynamic schema to one using a fixed one?

Not sure you need to actually transition.  Just remove the config in
solrconfig.xml that causes Solr to invoke the update chain where the
unknown fields are added, upload the new config to zookeeper, and reload
the collection.  When you do that, indexing with unknown fields will
fail, and if the indexing program has good error handling, somebody is
going to notice the failure.

The major difficulty with this will be more of a people problem than a
technical problem.  You have to convince people who use the Solr install
that it's a lot better that they get an indexing error and ask you to
fix it.  They may not care that you've got a major problem on your hands
when the system makes a mistake adding a field.

> I'm runnning 6.6.0 in cloud mode (only because it's necessary, as I
> understand it, to be in cloud mode for the authentication/authorization
> to work).  In my server/solr/configsets subdirectory there are
> directories "data_driven_schema_configs" and "basic_configs".  Both
> contain a file named "managed_schema."  Which one is the active one?

As of Solr 6.5.0, the basic authentication plugin also works in
non-cloud (standalone) mode.

https://issues.apache.org/jira/browse/SOLR-9481

I will typically recommend cloud mode to anyone setting up a brand new
Solr installation, mostly because it automates a lot of the steps of
setting up high availability.  I don't use cloud mode myself, because it
didn't exist when I set up my systems.  Converting to cloud mode would
require rewriting all of the tools I've written that keep the indexes up
to date.  I might do that one day, but not today.

In cloud mode, neither of the managed-schema files you have mentioned is
active.  The active config (solrconfig.xml, the schema, and all files
mentioned in either of those) is in zookeeper, not on the disk.

> From the AdminUI, each collection has an associated "managed_schema"
> (under the "Files" option).  I'm guessing that this collection-specific
> managed_schema is the result of the automated field discovery process,
> presumably using some baseline version (in configsets) to start with.

If you create a collection with "bin/solr create", the config that you
give it is usually uploaded to zookeeper and all shard replicas in the
collection use that uploaded config.  In older versions like 6.6.0,
basic_configs is used if no source config is named.  In newer versions,
_default is used.

When the update processor adds an unknown field, it is added to the
managed-schema file in zookeeper and the collection is reloaded.  The
source configset on disk is not touched.

> If that's true, then it would presumably make sense to save this
> collection-specific managed_schema to disk as schema.xml.  I further
> presume I'd create a config subdirectory for each of said collections
> and put schema.xml there.  Is that right?

As long as you're in cloud mode, all your index configs are in
zookeeper.  Any config you have on disk is NOT what is actually being used.

https://lucene.apache.org/solr/guide/6_6/using-zookeeper-to-manage-configuration-files.html

> Every time I read (and reread, and reread, ...) the Solr docs they seem
> to be making certain (very basic) assumptions that I'm unclear about, so
> your help in the preceding would be most appreciated.

The Solr documentation is not very friendly to novices.  Writing
documentation that an expert can use is sometimes difficult, but most
developers can manage it.  Writing documentation that a novice can use
is much harder, because it's not easy for someone who has intimate
knowledge of the system to step back and look at it from a place where
that knowledge isn't available.  Some success has been achieved in later
documentation versions.  It's going to take a lot of time and effort
before most of Solr's documentation is novice-friendly.

Thanks,
Shawn