Date facetting and ranges overlapping

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Date facetting and ranges overlapping

Guillaume Smet
Hi all,

I'm now using date facetting to browse events. It works really fine
and is really useful. The only problem so far is that if I have an
event which is exactly on the boundary of two ranges, it is referenced
2 times.

If we admit that we have a gap of 6 hours starting from 2007-09-27
12:00, ranges are: 2007-09-27 12:00->18:00 and 2007-09-27 18:00->
00:00. An event happening exactly at 18:00 is referenced in both
ranges and so if I select the first range Solr returns both ranges in
facet_dates instead of the first one only.

Couldn't we create the range so that they don't overlap? Something like:
2007-09-27 12:00 -> 2007-09-27 17:59:59.999 for the first one and
2007-09-27 18:00 -> 2007-09-27 23:59:59.999 for the second one.

I don't think people use date facetting with a millisecond range so
retrieving 1 millisecond shouldn't be too much a problem in practice.

Thanks for any comment.

--
Guillaume
Reply | Threaded
Open this post in threaded view
|

Re: Date facetting and ranges overlapping

hossman
: I'm now using date facetting to browse events. It works really fine
: and is really useful. The only problem so far is that if I have an
: event which is exactly on the boundary of two ranges, it is referenced
: 2 times.

yeah, this is one of the big caveats with date faceting right now ... i
struggled with this a bit when designing it, and ultimately decided to
punt on the issue.  the biggest hangup was that even if hte facet counting
code was smart about making sure the ranges don't overlap, the range query
syntax in the QueryParser doesn't support ranges that exclude one input
(so there wouldn't be a lot you can do with the ranges once you know the
counts in them)

one idea i had in SOLR-258 was that we could add an "interval" option that
would define how much to add to the "end" or one range to get the "start"
of another range (think of the current implementation having interval
hardcoded to "0") which would solve the problem and work with range
queries that were inclusive of both endpoints, but would require people to
use "-1MILLI" a lot.

a better option (assuming a query parser change) would be a new option
thta says wether each computed range should be enclusive of the low poin,t
the high point, both end points, neither end points, or be "smart" (where
smart is the same as "low" except for the last range where the it includes
both)

(I think there's already a lucene issue to add the query parser support, i
just haven't had time to look at it)

The simple workarround: if you know all of your data is indexed with
perfect 0.000second precision, then put "-1MILLI" at the end of your start
and end date faceting params.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Date facetting and ranges overlapping

Guillaume Smet
On 9/27/07, Chris Hostetter <[hidden email]> wrote:
> a better option (assuming a query parser change) would be a new option
> thta says wether each computed range should be enclusive of the low poin,t
> the high point, both end points, neither end points, or be "smart" (where
> smart is the same as "low" except for the last range where the it includes
> both)

That could be really cool.

> The simple workarround: if you know all of your data is indexed with
> perfect 0.000second precision, then put "-1MILLI" at the end of your start
> and end date faceting params.

Good idea. The only problem is that I'll have to modify my client code
to deal with the fact that solr now returns 17:59:59 instead of
18:00:00. Not difficult but less clean than before.

Thanks for the advice. I'll give it a try.

--
Guillaume
Reply | Threaded
Open this post in threaded view
|

Re: Date facetting and ranges overlapping

Guillaume Smet
In reply to this post by hossman
On 9/27/07, Chris Hostetter <[hidden email]> wrote:
> The simple workarround: if you know all of your data is indexed with
> perfect 0.000second precision, then put "-1MILLI" at the end of your start
> and end date faceting params.

It fixed my problem. Thanks.

--
Guillaume