Atomic Update w/ Date Copy Field

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Atomic Update w/ Date Copy Field

Todd Long
We recently started using atomic updates in our application and have since noticed that date fields copied to a text field have varying results between full and partial updates. When the document is fully updated the copied text date appears as expected (i.e. yyyy-MM-dd'T'HH:mm:ss.SSSZ); however, when the document is partially updated (while omitting the date field) the original stored date value is copied to a different format (i.e. EEE MMM d HH:mm:ss z yyyy). I've included an example below of what we are seeing with the indexed value of our "createdDate_facet_t" field. Is there a way that we can force the copy field to always use "yyyy-MM-dd'T'HH:mm:ss.SSSZ" as the resulting text format without having to always include the field in the update?

schema
--------
<dynamicField name="*_dt" type="date" indexed="true" stored="true" />
<dynamicField name="*_t" type="text_general" indexed="true" stored="true" omitNorms="true" />
<dynamicField name="*_facet_t" type="text_general" indexed="true" stored="false" multiValued="true"
omitNorms="true" />

<copyField source="*_dt" dest="*_facet_t" />

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory" />
    <tokenizer class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <tokenizer class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

/update (full)
-------------
{
  "id": "12345",
  "createdBy_t": "someone",
  "createdDate_dt": "2015-07-14T12:58:17.535Z"
}

createdDate_facet_t = "2015-07-14t12:58:17.535z"

/update (partial)
----------------
{
  "id": "12345",
  "createdBy_t": { "set": "another" }
}

createdDate_facet_t = "tue jul 14 12:58:17 utc 2015"
Reply | Threaded
Open this post in threaded view
|

Re: Atomic Update w/ Date Copy Field

Todd Long
It looks like the issue has to do with the Date object. When the document is fully updated (with the date specified) the field is created with a String object so everything is indexed as it appears. When the document is partially updated (with the date omitted) the field is re-created using the previously stored Date object which takes the "toString" representation (i.e. EEE MMM dd HH:mm:ss zzz yyyy).

I ended up creating a DateTextField which extends TextField and simply overrides the "FieldType.createField(SchemaField, Object, float)" method. I then check for a Date instance and format as necessary.

Any ideas on a better approach or does it sound like this is the way to go? I wasn't sure if this could be accomplished in a filter or some other way.
Reply | Threaded
Open this post in threaded view
|

Re: Atomic Update w/ Date Copy Field

Stefan Matheis-3
To me, it sounds more like you shouldn’t have to care about such gory details as a user - at all.

would you mind opening a issue on JIRA Todd? Including all the details you already provided in as well as a link to this thread, would be best.

Depending on what you actually did to find this all out, you probably do even have a test case at hand which demonstrates the behaviour? if not, that’s obviously not a problem :)

-Stefan


On August 30, 2016 at 3:51:42 PM, Todd Long ([hidden email]) wrote:

> It looks like the issue has to do with the Date object. When the document is
> fully updated (with the date specified) the field is created with a String
> object so everything is indexed as it appears. When the document is
> partially updated (with the date omitted) the field is re-created using the
> previously stored Date object which takes the "toString" representation
> (i.e. EEE MMM dd HH:mm:ss zzz yyyy).
>  
> I ended up creating a DateTextField which extends TextField and simply
> overrides the "FieldType.createField(SchemaField, Object, float)" method. I
> then check for a Date instance and format as necessary.
>  
> Any ideas on a better approach or does it sound like this is the way to go?
> I wasn't sure if this could be accomplished in a filter or some other way.
>  
>  
>  
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Atomic-Update-w-Date-Copy-Field-tp4293779p4293968.html 
> Sent from the Solr - User mailing list archive at Nabble.com.
>  

Reply | Threaded
Open this post in threaded view
|

Re: Atomic Update w/ Date Copy Field

Alexandre Rafalovitch
I noticed (and abused) the issue Todd described in my Solr puzzle at:
http://blog.outerthoughts.com/2016/04/solr-5-puzzle-magic-date-answer/

The second format ("EEE...") looks rather strange. I would suspect
that the conversion Date->String code is using the active locale and
that is the format default for that locale. So, the bug might be that
the locale needs to be more specific to preserve the consistence.

Regards,
    Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 30 August 2016 at 23:25, Stefan Matheis <[hidden email]> wrote:

> To me, it sounds more like you shouldn’t have to care about such gory details as a user - at all.
>
> would you mind opening a issue on JIRA Todd? Including all the details you already provided in as well as a link to this thread, would be best.
>
> Depending on what you actually did to find this all out, you probably do even have a test case at hand which demonstrates the behaviour? if not, that’s obviously not a problem :)
>
> -Stefan
>
>
> On August 30, 2016 at 3:51:42 PM, Todd Long ([hidden email]) wrote:
>> It looks like the issue has to do with the Date object. When the document is
>> fully updated (with the date specified) the field is created with a String
>> object so everything is indexed as it appears. When the document is
>> partially updated (with the date omitted) the field is re-created using the
>> previously stored Date object which takes the "toString" representation
>> (i.e. EEE MMM dd HH:mm:ss zzz yyyy).
>>
>> I ended up creating a DateTextField which extends TextField and simply
>> overrides the "FieldType.createField(SchemaField, Object, float)" method. I
>> then check for a Date instance and format as necessary.
>>
>> Any ideas on a better approach or does it sound like this is the way to go?
>> I wasn't sure if this could be accomplished in a filter or some other way.
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Atomic-Update-w-Date-Copy-Field-tp4293779p4293968.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Atomic Update w/ Date Copy Field

Todd Long
In reply to this post by Stefan Matheis-3
Stefan Matheis-3 wrote
To me, it sounds more like you shouldn’t have to care about such gory details as a user - at all.

would you mind opening a issue on JIRA Todd? Including all the details you already provided in as well as a link to this thread, would be best.

Depending on what you actually did to find this all out, you probably do even have a test case at hand which demonstrates the behaviour? if not, that’s obviously not a problem :)
Agreed on the gory details. Yes, it definitely seems like the format should be consistent between full and partial updates. I'll go ahead and open an issue on JIRA.

Alexandre Rafalovitch wrote
I noticed (and abused) the issue Todd described in my Solr puzzle at:
http://blog.outerthoughts.com/2016/04/solr-5-puzzle-magic-date-answer/

The second format ("EEE...") looks rather strange. I would suspect
that the conversion Date->String code is using the active locale and
that is the format default for that locale. So, the bug might be that
the locale needs to be more specific to preserve the consistence.
Thank you for the Solr puzzle reference. The EEE format is most certainly the java.util.Date.toString() method being called when re-creating the field.