Newbie question about why represent timestamps as "float" values

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Newbie question about why represent timestamps as "float" values

billtorcaso


I have inherited a working SOLR installation, that has not been upgraded since solr 4.0.  My task is to bring it forward (at least 6.x, maybe 7.x).  I am brand new to SOLR.

Here is my question.  In schema.xml, there is this field:

        <field name="unixdate" type="float" indexed="true" stored="true" />

Question:  why is this declared as a float datatype?  I'm just looking for an explanation of what is there – any changes come later, after I understand things better.

I understand about milliseconds from the epoch.  I would expect that the author would have used an integer or a long integer to hold such a millisecond count, or a DateField or TrieDateField.
I wonder if there is some Solr magic at work.

Thanks,

  ---  Bill
Reply | Threaded
Open this post in threaded view
|

Re: Newbie question about why represent timestamps as "float" values

Chris Hostetter-3

: Here is my question.  In schema.xml, there is this field:
:
:         <field name="unixdate" type="float" indexed="true" stored="true" />
:
: Question:  why is this declared as a float datatype?  I'm just looking
: for an explanation of what is there – any changes come later, after I
: understand things better.

You would hvae to ask the creator of that schema.xml file why they made
that choice ... to the best of my knowledge, no sample/example schema that
has ever shipped with any version of solr has ever included a "unixdate"
field -- let alone one that suggested "float" would be a logically correct
data type for storing that type of information.


-Hoss
http://www.lucidworks.com/
Reply | Threaded
Open this post in threaded view
|

Re: Newbie question about why represent timestamps as "float" values

Erick Erickson
What Hoss said, and in addition somewhere some
custom code has to be translating things back and
forth. For dates, Solr wants YYYY-MM-DDTHH:MM:SSZ
as a date string it knows how to deal with. That simply
couldn't parse as a float type so there's some custom
code that transforms dates into a float at ingest
time and converts from float to something recognizable
as a date on output.



On Mon, Oct 9, 2017 at 2:06 PM, Chris Hostetter
<[hidden email]> wrote:

>
> : Here is my question.  In schema.xml, there is this field:
> :
> :         <field name="unixdate" type="float" indexed="true" stored="true" />
> :
> : Question:  why is this declared as a float datatype?  I'm just looking
> : for an explanation of what is there – any changes come later, after I
> : understand things better.
>
> You would hvae to ask the creator of that schema.xml file why they made
> that choice ... to the best of my knowledge, no sample/example schema that
> has ever shipped with any version of solr has ever included a "unixdate"
> field -- let alone one that suggested "float" would be a logically correct
> data type for storing that type of information.
>
>
> -Hoss
> http://www.lucidworks.com/
Reply | Threaded
Open this post in threaded view
|

Re: Newbie question about why represent timestamps as "float" values

alessandro.benedetti
There was time ago a Solr installation which had the same problem, and the
author explained me that the choice was made for performance reasons.
Apparently he was sure that handling everything as primitive types would
give a boost to the Solr searching/faceting performance.
I never agreed ( and one of the reasons is that you need to transform back
from float to dates to actually render them in a readable format).

Furthermore I tend to rely on standing on the shoulders of giants, so if a
community ( not just a single developer) spent time implementing a date type
( with the different available implementations) to manage specifically date
information, I tend to thrust them and believe that the best approach to
manage dates is to use that ad hoc date type ( in its variants, depending on
the use cases).

As a plus, using the right data type gives you immense power in debugging
and understanding better your data.
For proper maintenance , it is another good reason to stick with standards.



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Newbie question about why represent timestamps as "float" values

Michael Kuhlmann-5
While you're generally right, in this case it might make sense to stick
to a primitive type.

I see "unixtime" as a technical information, probably from
System.currentTimeMillis(). As long as it's not used as a "real world"
date but only for sorting based on latest updates, or chosing which
document is more recent, it's totally okay to index it as a long value.

But definitely not as a float.

-Michael

Am 10.10.2017 um 10:55 schrieb alessandro.benedetti:

> There was time ago a Solr installation which had the same problem, and the
> author explained me that the choice was made for performance reasons.
> Apparently he was sure that handling everything as primitive types would
> give a boost to the Solr searching/faceting performance.
> I never agreed ( and one of the reasons is that you need to transform back
> from float to dates to actually render them in a readable format).
>
> Furthermore I tend to rely on standing on the shoulders of giants, so if a
> community ( not just a single developer) spent time implementing a date type
> ( with the different available implementations) to manage specifically date
> information, I tend to thrust them and believe that the best approach to
> manage dates is to use that ad hoc date type ( in its variants, depending on
> the use cases).
>
> As a plus, using the right data type gives you immense power in debugging
> and understanding better your data.
> For proper maintenance , it is another good reason to stick with standards.
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Reply | Threaded
Open this post in threaded view
|

Re: Newbie question about why represent timestamps as "float" values

Erick Erickson
Hold it. "date", "tdate", "pdate" _are_ primitive types. Under the
covers date/tdate are just a tlong type, newer Solrs have a "pdate"
which is a point numeric type. All that these types do is some parsing
up front so you can send human-readable data (and get it back). But
under the covers it's still a primitive.

And the idea of making it a float is _certainly_ worse than a long.
Last time I checked, floats were more expensive to work with than
longs. If this  was done for "efficiency" it wasn't done correctly.

It's vaguely possible that if this was done for efficiency, it was
done loooong ago when dates could be strings. Certainly there's a
performance argument there, but that hasn't been the case for a very
long time.

</rant>
Erick

On Tue, Oct 10, 2017 at 2:24 AM, Michael Kuhlmann <[hidden email]> wrote:

> While you're generally right, in this case it might make sense to stick
> to a primitive type.
>
> I see "unixtime" as a technical information, probably from
> System.currentTimeMillis(). As long as it's not used as a "real world"
> date but only for sorting based on latest updates, or chosing which
> document is more recent, it's totally okay to index it as a long value.
>
> But definitely not as a float.
>
> -Michael
>
> Am 10.10.2017 um 10:55 schrieb alessandro.benedetti:
>> There was time ago a Solr installation which had the same problem, and the
>> author explained me that the choice was made for performance reasons.
>> Apparently he was sure that handling everything as primitive types would
>> give a boost to the Solr searching/faceting performance.
>> I never agreed ( and one of the reasons is that you need to transform back
>> from float to dates to actually render them in a readable format).
>>
>> Furthermore I tend to rely on standing on the shoulders of giants, so if a
>> community ( not just a single developer) spent time implementing a date type
>> ( with the different available implementations) to manage specifically date
>> information, I tend to thrust them and believe that the best approach to
>> manage dates is to use that ad hoc date type ( in its variants, depending on
>> the use cases).
>>
>> As a plus, using the right data type gives you immense power in debugging
>> and understanding better your data.
>> For proper maintenance , it is another good reason to stick with standards.
>>
>>
>>
>> -----
>> ---------------
>> Alessandro Benedetti
>> Search Consultant, R&D Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>