Appropriate field type for date-without-time

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Appropriate field type for date-without-time

Christopher Schultz
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

All,

I'd usually call this a "date", but Solr's documentation says that a
"date" is what I would call a timestamp (including time zone).

https://lucene.apache.org/solr/guide/7_3/field-types-included-with-solr.
html

[ I remember reading but cannot currently seem to find a reference
page with the actual pre-defined field types Solr ships with. That
page above lists the class names, but not the aliases used by a real
Solr installation.

For example, if I want to store an integral numeric value, I know I
want to use "pint", but can't actually find the reference for that. ]

I have dates that have no timestamps on them, and I'd like to store
them and probably sort by them. I'm not sure whether we would care to
search for documents whose date fields are within a certain range,
etc. at this point.

I could convert the date into a number e.g. 20180415 for today and
simply store it as a "pint", but that might, ahem, surprise someone
looking at documents in the collection and expect that an obvious
"date"-oriented field was in fact an int. Also, the year 10000 bug
will rear its ugly head many generations from now.

Is there an existing appropriate field type for "date-without-time"?

Thanks,
- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrTtr8ACgkQHPApP6U8
pFgulw//VEY527dP/rSvar7Q/XK6lXBNNrl3C7QOse1WlZq27+WRy7A4JwgfKzaR
gvIvLCIytDBznI+Xye72iLuyYnbKn92OLv4sz/jazQfIK9qwlEIRe0ZDKqWZI8k0
CNz3HrfKC5o4Qe84H8dj91PK8U00Q2EGjHe/WY2yS0vYhs4bp4xaVM0Ks2VcRvo1
Jw1DyPwyODTPEQRQ0DdowE6InIJzJ2r+A6OrexvRUMng6AldbOKJjanqgSbZf5lF
07+nnT5Raejs3pIQCbyrCWuxOMGiTsR5rxYy8TTlnUdyqgRChDEaJD4tFBFv/sis
ez03T3EsIBz6Ha4BLhFRLhtssjYX6+5gyrJUd32xaUYtvsQR0ca0iE9gzNBVXNzz
ZsRNGEmjOE3khJX4UL1MuGgQRbLlKfSunz/58HdXlzzmIG9LwryKj3G85diRYUmh
Ge9PUmjUg9u+VfzqgfFqO3Mf1FhQkW/ejAli7I3N8hHk81Iyvhdm+eqyuhq5GFNy
U7Kxmmg1DfJIumXu+4jczUuN8TI+xanvB2yiTgsycbIfGAL5LRMoRi/yN8+DhaUX
HOvGhWprzzuNb+AM4heLq/dAk2vD/zWK91Vc2YLAy9/W/WW9xeoIzRLvb32y6oq7
OVUuni0IjVzphLJOgfZOtCBUdAKWAwSMOohJ6+v7GcAW1xBzSP4=
=cgb/
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: Appropriate field type for date-without-time

Shawn Heisey-2
On 4/15/2018 2:31 PM, Christopher Schultz wrote:
> I'd usually call this a "date", but Solr's documentation says that a
> "date" is what I would call a timestamp (including time zone).

That is correct.  Lucene dates are accurate to the millisecond.  They
don't actually handle timezones the way you might be thinking -- the
information is UTC.  When using date rounding (NOW/WEEK, NOW/DAY, etc)
you can tell Solr what the timezone is so that the boundaries are
correct, but the information in the index is UTC.

> https://lucene.apache.org/solr/guide/7_3/field-types-included-with-solr.
> html
>
> [ I remember reading but cannot currently seem to find a reference
> page with the actual pre-defined field types Solr ships with. That
> page above lists the class names, but not the aliases used by a real
> Solr installation.

That info is what you need to define the fieldType in the schema. So you
would put something like "solr.DatePointField" as the class.

https://lucene.apache.org/solr/guide/7_3/working-with-dates.html

> Is there an existing appropriate field type for "date-without-time"?

The answer to this question is not yes, but it's also not no.  All date
types in Solr have millisecond precision.

But if you use DateRangeField, you can deal with larger time periods.  A
query like "2018" actually works.  At both query and index time, the
less precise syntax is translated internally to a *range* before the
query or indexing happens.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Appropriate field type for date-without-time

Christopher Schultz
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Shawn,

On 4/15/18 4:49 PM, Shawn Heisey wrote:

> On 4/15/2018 2:31 PM, Christopher Schultz wrote:
>> I'd usually call this a "date", but Solr's documentation says
>> that a "date" is what I would call a timestamp (including time
>> zone).
>
> That is correct.  Lucene dates are accurate to the millisecond.
> They don't actually handle timezones the way you might be thinking
> -- the information is UTC.  When using date rounding (NOW/WEEK,
> NOW/DAY, etc) you can tell Solr what the timezone is so that the
> boundaries are correct, but the information in the index is UTC.
>
>> https://lucene.apache.org/solr/guide/7_3/field-types-included-with-so
lr.
>>
>>
html
>>
>> [ I remember reading but cannot currently seem to find a
>> reference page with the actual pre-defined field types Solr ships
>> with. That page above lists the class names, but not the aliases
>> used by a real Solr installation.
>
> That info is what you need to define the fieldType in the schema.
> So you would put something like "solr.DatePointField" as the
> class.

What about the "standard" aliases for existing fieldTypes? I remember
reading a page where "int" versus "pint" were compared, but I can't
seem to find that, now.

>> Is there an existing appropriate field type for
>> "date-without-time"?
>
> The answer to this question is not yes, but it's also not no.  All
> date types in Solr have millisecond precision.

Okay, so if I want to have a date-without-timestamp, I'll either need
to set all timestamps to 00:00:00 or invent something like
pint-encoded-date, right?

> But if you use DateRangeField, you can deal with larger time
> periods.  A query like "2018" actually works.  At both query and
> index time, the less precise syntax is translated internally to a
> *range* before the query or indexing happens.

Sounds like wasting a little space with 00:00:00 timestamps is
probably the way to go. Even if using pint would be equivalent (and
perhaps even a little more efficient), I think using a "real" date
field is more appropriate.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrUoEoACgkQHPApP6U8
pFj4lBAAzBSwzlq/mYpK9KraK3UkRhvDfQY5Tk9UpjaDvigROMks5oaGUybZmYLa
6oIguO+xwrMpYU08X3RCtDMPkJKFxXcQhj4x3zgMj/JM2FaCjgkWMsE1oU+68qKB
Ad4HMMqPsmDuG22zcXJWlMLNIfgZk89u2c97Tt/eWvtUYMnZMjT+6CfA43z8JRnM
i8ixDaEl7TZVDD3G4YW/cXCQacpIPmynMOH60gng5ylC04nMLCQyvf3zV0WB7X+t
JTGEjGmMENJhqVq3PnH6VYjGeSU92c8/bbEf+us1nRkIjayEnA7Uv7L87l56viVY
3jpEvHPjGiluDpTfLRUQzaTvu7PUwL1MefmKYnri9NP+HB2v8AhGN+oCyRI/RM5r
hYMTOdyX9VcVOUF3DluWpOCpG9WaJaEfT6ifw6bifNQpWG9lj6B8zxAfGGWRL9dU
iOOCBYwDioYaolRz6oIcTny22/mm3SE4IXGkrH9C2U9WU/nUFhWEjqbw4MWF0ten
0RSJ8coj05fsFdA0A1owA2wOqXuJGmaMfNjZiPR05ucgIFaM0MxgIyFzNeMGxKSd
aUp5EfrS2EHa23DDgsMF0i7C5KTw/Xlzr0Y+9WWdSlRWtYGvBZThP261lJ/jHmpS
FcDsNz4Y5/V2XnNcp0ieD+RoaAMctiehFuzPu9h2awZcF25CGDI=
=vaBk
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: Appropriate field type for date-without-time

Erick Erickson
bq: Sounds like wasting a little space with 00:00:00 timestamps is
probably the way to go

What space? Under the covers it's just a long. The doc is slightly bigger
of course.

And, you could use the ParseDateFieldUpdateProcessorFactory, see the
reference guide. It's job is to take various inputs and transform them
to the canonical form.

Best,
Erick


On Mon, Apr 16, 2018 at 6:08 AM, Christopher Schultz
<[hidden email]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Shawn,
>
> On 4/15/18 4:49 PM, Shawn Heisey wrote:
>> On 4/15/2018 2:31 PM, Christopher Schultz wrote:
>>> I'd usually call this a "date", but Solr's documentation says
>>> that a "date" is what I would call a timestamp (including time
>>> zone).
>>
>> That is correct.  Lucene dates are accurate to the millisecond.
>> They don't actually handle timezones the way you might be thinking
>> -- the information is UTC.  When using date rounding (NOW/WEEK,
>> NOW/DAY, etc) you can tell Solr what the timezone is so that the
>> boundaries are correct, but the information in the index is UTC.
>>
>>> https://lucene.apache.org/solr/guide/7_3/field-types-included-with-so
> lr.
>>>
>>>
> html
>>>
>>> [ I remember reading but cannot currently seem to find a
>>> reference page with the actual pre-defined field types Solr ships
>>> with. That page above lists the class names, but not the aliases
>>> used by a real Solr installation.
>>
>> That info is what you need to define the fieldType in the schema.
>> So you would put something like "solr.DatePointField" as the
>> class.
>
> What about the "standard" aliases for existing fieldTypes? I remember
> reading a page where "int" versus "pint" were compared, but I can't
> seem to find that, now.
>
>>> Is there an existing appropriate field type for
>>> "date-without-time"?
>>
>> The answer to this question is not yes, but it's also not no.  All
>> date types in Solr have millisecond precision.
>
> Okay, so if I want to have a date-without-timestamp, I'll either need
> to set all timestamps to 00:00:00 or invent something like
> pint-encoded-date, right?
>
>> But if you use DateRangeField, you can deal with larger time
>> periods.  A query like "2018" actually works.  At both query and
>> index time, the less precise syntax is translated internally to a
>> *range* before the query or indexing happens.
>
> Sounds like wasting a little space with 00:00:00 timestamps is
> probably the way to go. Even if using pint would be equivalent (and
> perhaps even a little more efficient), I think using a "real" date
> field is more appropriate.
>
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrUoEoACgkQHPApP6U8
> pFj4lBAAzBSwzlq/mYpK9KraK3UkRhvDfQY5Tk9UpjaDvigROMks5oaGUybZmYLa
> 6oIguO+xwrMpYU08X3RCtDMPkJKFxXcQhj4x3zgMj/JM2FaCjgkWMsE1oU+68qKB
> Ad4HMMqPsmDuG22zcXJWlMLNIfgZk89u2c97Tt/eWvtUYMnZMjT+6CfA43z8JRnM
> i8ixDaEl7TZVDD3G4YW/cXCQacpIPmynMOH60gng5ylC04nMLCQyvf3zV0WB7X+t
> JTGEjGmMENJhqVq3PnH6VYjGeSU92c8/bbEf+us1nRkIjayEnA7Uv7L87l56viVY
> 3jpEvHPjGiluDpTfLRUQzaTvu7PUwL1MefmKYnri9NP+HB2v8AhGN+oCyRI/RM5r
> hYMTOdyX9VcVOUF3DluWpOCpG9WaJaEfT6ifw6bifNQpWG9lj6B8zxAfGGWRL9dU
> iOOCBYwDioYaolRz6oIcTny22/mm3SE4IXGkrH9C2U9WU/nUFhWEjqbw4MWF0ten
> 0RSJ8coj05fsFdA0A1owA2wOqXuJGmaMfNjZiPR05ucgIFaM0MxgIyFzNeMGxKSd
> aUp5EfrS2EHa23DDgsMF0i7C5KTw/Xlzr0Y+9WWdSlRWtYGvBZThP261lJ/jHmpS
> FcDsNz4Y5/V2XnNcp0ieD+RoaAMctiehFuzPu9h2awZcF25CGDI=
> =vaBk
> -----END PGP SIGNATURE-----