Date Query Confusion

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Date Query Confusion

Terry Steichen
To me, one of the more frustrating things I've encountered in Solr is
working with date fields.  Supposedly, according to the documentation,
this is straightforward.  But in my experience, it is anything but
that.  In particular, I've found that the abbreviated forms of date
queries, don't work as described.

If I create a query like creation_date: [2016-10-01 To 2016-11-01], it
will produce a set of documents produced in the month of November 2016. 
That's the good news.

But, the abbreviated date queries (described in Solr documentation
<https://lucene.apache.org/solr/guide/6_6/working-with-dates.html>)
don't work.  Tried creation_date: 2016-11.  That's supposed to match
documents with any November 2016 date.  But actually produces: 
|"Invalid Date String:'2016-11'|

||And Solr doesn't seem to let me sort on a date field.  Tried
creation_date asc  Produced: |"can not sort on multivalued field:
creation_date"|

In the AdminUI, if you go to the schema option for my collection, and
examine creation_date it show it to be:
org.apache.solr.schema.TrieDateField  (This was automatically chosen by
the managed-schema)

In that same AdminUI display, if I click "Load Term Info" I get a list
of dates, but when I click on one, it transforms it into a different
query form: {!term f=creation_date}2016-10-26T07:59:09.824Z  But this
query still produces 0 hits (even though the listing says it should
produce dozens of hits).

I imagine that I'm missing something basic here.  But I have no idea
what.  Any thoughts would be MOST welcome.

PS: I'm using Solr 6.6.0.
Reply | Threaded
Open this post in threaded view
|

Re: Date Query Confusion

Erick Erickson
Yeah, dates are "special".

Those abbreviated dates are for DateRangeField, which is a distinct
type from "TrieDate" in your schema.

bq. And Solr doesn't seem to let me sort on a date field

It's not a date field that's the problem, it's the "multiValued" part.
When you specify in your schema that the field is multiValued, it
means you can have more than one date in the doc. So how should it be
sorted? Newest first? Oldest first? Whatever you choose is wrong.
Again it's a schema change, set multiValued="false". You _might_ be
able to solve both problems by sorting via a function query (warning,
haven't tried this lately with date fields but "it should work"), see:
https://lucidworks.com/2015/09/10/minmax-on-multivalued-field/ The
problem there is it must be a docValues="true" field.

This is why we strongly recommend against using "schemaless" mode in
production, schemaless makes the best decision it can, but pretty soon
you want run into issues like these when your intended use isn't
supported.

bq.  {!term f=creation_date}2016-10-26T07:59:09.824Z

Well, "it works on my machine", I admit I had a Solr 6.1 version lying
around and used the techproducts example where the date field is
defined as:
type="date"    indexed="true"  stored="true"
and "date" is:
<fieldType name="date" class="solr.TrieDateField" docValues="true"
precisionStep="0" positionIncrementGap="0"/>

The "tdate" field should work identically.

The critical bits here I believe are docValues=true and
multiValued=false by default.

So I'd start by trying the techproducts example "bin/solr start -e
techproducts" which will create docs as I did, and see if you have the
same problem, then use a similar field definition for your real
system.

And if you do change the schema, you need to blow away the entire
index "rm -rf core/data" or create a new collection if using SolrCloud
and re-index.

Best,
Erick

On Thu, May 17, 2018 at 9:11 AM, Terry Steichen <[hidden email]> wrote:

> To me, one of the more frustrating things I've encountered in Solr is
> working with date fields.  Supposedly, according to the documentation,
> this is straightforward.  But in my experience, it is anything but
> that.  In particular, I've found that the abbreviated forms of date
> queries, don't work as described.
>
> If I create a query like creation_date: [2016-10-01 To 2016-11-01], it
> will produce a set of documents produced in the month of November 2016.
> That's the good news.
>
> But, the abbreviated date queries (described in Solr documentation
> <https://lucene.apache.org/solr/guide/6_6/working-with-dates.html>)
> don't work.  Tried creation_date: 2016-11.  That's supposed to match
> documents with any November 2016 date.  But actually produces:
> |"Invalid Date String:'2016-11'|
>
> ||And Solr doesn't seem to let me sort on a date field.  Tried
> creation_date asc  Produced: |"can not sort on multivalued field:
> creation_date"|
>
> In the AdminUI, if you go to the schema option for my collection, and
> examine creation_date it show it to be:
> org.apache.solr.schema.TrieDateField  (This was automatically chosen by
> the managed-schema)
>
> In that same AdminUI display, if I click "Load Term Info" I get a list
> of dates, but when I click on one, it transforms it into a different
> query form: {!term f=creation_date}2016-10-26T07:59:09.824Z  But this
> query still produces 0 hits (even though the listing says it should
> produce dozens of hits).
>
> I imagine that I'm missing something basic here.  But I have no idea
> what.  Any thoughts would be MOST welcome.
>
> PS: I'm using Solr 6.6.0.
Reply | Threaded
Open this post in threaded view
|

Re: Date Query Confusion

Alessandro Benedetti
In reply to this post by Terry Steichen
Hi Terry,
let me go in order :

/"Tried creation_date: 2016-11.  That's supposed to match
documents with any November 2016 date.  But actually produces:  
|"Invalid Date String:'2016-11'| "/

Is "*DateRangeField*" the field type for your field : "creation_date" ? [1]
You mentioned : org.apache.solr.schema.TrieDateField, this is not going to
work, you need the specific field type I mentioned to use that date range
syntax.

/"||And Solr doesn't seem to let me sort on a date field.  Tried
creation_date asc  Produced: |"can not sort on multivalued field:
creation_date"| "/

Is your "creation_date" single valued ?
If it is single valued semantically, make sure it is defined as single
valued in the schema.
Solr doesn't support sorting on multi valued fields.
You schemaless conf may have assigned the multi valued attribute to that
field.

From the Wiki[2] :
"Solr can sort query responses according to document scores or the value of
any field with a single value that is either indexed or uses DocValues (that
is, any field whose attributes in the Schema include multiValued="false" and
either docValues="true" or indexed="true" – if the field does not have
DocValues enabled, the indexed terms are used to build them on the fly at
runtime), provided that:"

Hope this helps,

Regards



[1]
https://lucene.apache.org/solr/guide/6_6/working-with-dates.html#WorkingwithDates-DateRangeFormatting
[2]
https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-ThesortParameter



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Date Query Confusion

Tim Casey
In reply to this post by Terry Steichen
A simple date range query does not really represent how people query over
time and dates.  If you want any form of date queries, above a single
range, then a special field allowing tokenized query will be the only way
to find documents.

A query for 'ever tuesday in november of 2017' would have to be written as
an or clause over a set of date ranges.  A tokenized date field would just
have to query for "+nov +tues +2017".  How you choose to tokenize a date
into a field will determine the types of queries you can run over the data.

Another part of this is query for a date range, when the source material
has date ranges built into it is kinda odd.  But it occurs.  If you query
from noon-1p does that include meeting notes which started at 1130a, but
went for an hour?  You have to choose what to do.

tim

On Thu, May 17, 2018 at 6:11 AM, Terry Steichen <[hidden email]> wrote:

> To me, one of the more frustrating things I've encountered in Solr is
> working with date fields.  Supposedly, according to the documentation,
> this is straightforward.  But in my experience, it is anything but
> that.  In particular, I've found that the abbreviated forms of date
> queries, don't work as described.
>
> If I create a query like creation_date: [2016-10-01 To 2016-11-01], it
> will produce a set of documents produced in the month of November 2016.
> That's the good news.
>
> But, the abbreviated date queries (described in Solr documentation
> <https://lucene.apache.org/solr/guide/6_6/working-with-dates.html>)
> don't work.  Tried creation_date: 2016-11.  That's supposed to match
> documents with any November 2016 date.  But actually produces:
> |"Invalid Date String:'2016-11'|
>
> ||And Solr doesn't seem to let me sort on a date field.  Tried
> creation_date asc  Produced: |"can not sort on multivalued field:
> creation_date"|
>
> In the AdminUI, if you go to the schema option for my collection, and
> examine creation_date it show it to be:
> org.apache.solr.schema.TrieDateField  (This was automatically chosen by
> the managed-schema)
>
> In that same AdminUI display, if I click "Load Term Info" I get a list
> of dates, but when I click on one, it transforms it into a different
> query form: {!term f=creation_date}2016-10-26T07:59:09.824Z  But this
> query still produces 0 hits (even though the listing says it should
> produce dozens of hits).
>
> I imagine that I'm missing something basic here.  But I have no idea
> what.  Any thoughts would be MOST welcome.
>
> PS: I'm using Solr 6.6.0.
>