Multiple time ranges in a document

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple time ranges in a document

Vijay Santhanam
Hello,

 

I'm using a RangeFilter to find "Event" documents (with Start and End lucene
friendly formatted date fields) that match a Users time range query. This
works perfectly in sub-second times at decent loads, but I'm having trouble
searching multiple performances in the one document. Indexing them is no
problem, because I can add extra terms to the start and end fields.

 

Here's a situation that doesn't work to well with the RangeFilter:-

 

Let's say a comedian has a regular gig every Monday for the next 3 weeks,
from 7pm-9pm. So, the start field will be 200702191900, 200702261900,
200703051900. And, the end field will be 200702192100, 200702262100,
200703052100.

If someone searches for an event on Thursday anytime during his 3 week
stint, the comedian's event will show up, because the Range Filter will
consider the lowest term of the start field and the highest term of the end
field.

 

Also, sorting by start or end fields will break, but I could write my own
SortComparatorSource to fix that.

 

How could I get around the filter problem? I could write my own filter, but
it would need to keep track of both fields, and their respective term
positions for each field.

 

Thanks for your help,

-Vijay

 

Reply | Threaded
Open this post in threaded view
|

Re: Multiple time ranges in a document

mark harwood
The problem arises because there are multiple ranges defined in the document and it is not easy to test the start/end value pairs when they are held as independent values in separate fields. AFAIK there is currently no query implementation for testing position relationships in words from more than one field.
There is one sneaky way however, to record and query range information in the one field - you can (ab)use the position information stored by Lucene to encode data such as time and then query for ranges using SpanQueries.

By writing a custom analyzer you can artificially "place" words in a required location in a field by setting the position increment value appropriately.
If you chose to think of the extent of the field as representing time e.g. the hours or minutes in a day/week you can post pairs of "start" and "end" words at an appropriate location to effectively record the range of time or date information.

This solves the problem of recording pairs of ranges that can be queried.

Now for your query - your example query is actually a collection of ranges rather than one single range (just Thursdays) so this would need to be expressed as multiple SpanQueries testing the areas of the document which represent the time period of interest to see if they contain a start and end pair.

Worth a try?

Mark

----- Original Message ----
From: Vijay Santhanam <[hidden email]>
To: [hidden email]
Sent: Sunday, 18 February, 2007 11:43:39 PM
Subject: Multiple time ranges in a document

Hello,

 

I'm using a RangeFilter to find "Event" documents (with Start and End lucene
friendly formatted date fields) that match a Users time range query. This
works perfectly in sub-second times at decent loads, but I'm having trouble
searching multiple performances in the one document. Indexing them is no
problem, because I can add extra terms to the start and end fields.

 

Here's a situation that doesn't work to well with the RangeFilter:-

 

Let's say a comedian has a regular gig every Monday for the next 3 weeks,
from 7pm-9pm. So, the start field will be 200702191900, 200702261900,
200703051900. And, the end field will be 200702192100, 200702262100,
200703052100.

If someone searches for an event on Thursday anytime during his 3 week
stint, the comedian's event will show up, because the Range Filter will
consider the lowest term of the start field and the highest term of the end
field.

 

Also, sorting by start or end fields will break, but I could write my own
SortComparatorSource to fix that.

 

How could I get around the filter problem? I could write my own filter, but
it would need to keep track of both fields, and their respective term
positions for each field.

 

Thanks for your help,

-Vijay

 






       
       
               
___________________________________________________________
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes.
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Multiple time ranges in a document

Chris Hostetter-3

this came up on the solr list about a month ago and reminded me of
something doug mentioned durring the BOF at apachecon last year ... if you
are interested in writing your own Query subclass, this thread has some
thoughts on how this might be possible...

http://www.nabble.com/One-item%2C-multiple-fields%2C-and-range-queries-tf2969183.html#a8377712

-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]