Range queries

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Range queries

Gwyn Carwardine
Two queries about ranges:

1. field:[a TO z] does not return the same as field:[z TO a]

I think it should. The standard QueryParser or even the range query should
ascertain the lowest and highest and switch them around if necessary

2. How do I search for negative numbers in a range. For example field:[-3 TO
2] ?

I don't mind hacking code such that my numbers are indexed as +00000001 and
-00000001 and then I can override the query parser to change my query to
[-0000003 TO +0000002]. However.. "+" is less then "-" in Ascii terms so a
range search isn't going to work terrifically well.. Is there a standard
approach??

-Gwyn



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Range queries

Erik Hatcher

On Jan 23, 2006, at 10:38 AM, Gwyn Carwardine wrote:
> Two queries about ranges:
>
> 1. field:[a TO z] does not return the same as field:[z TO a]
>
> I think it should. The standard QueryParser or even the range query  
> should
> ascertain the lowest and highest and switch them around if necessary

This is quite debatable.  I personally don't think the ends of the  
ranges should be swapped when the left one is greater than the right  
one.  The word "TO" indicates the user is supplying the ends already  
in order.

But if the consensus was to adjust it, I wouldn't object.

> 2. How do I search for negative numbers in a range. For example  
> field:[-3 TO
> 2] ?
>
> I don't mind hacking code such that my numbers are indexed as  
> +00000001 and
> -00000001 and then I can override the query parser to change my  
> query to
> [-0000003 TO +0000002]. However.. "+" is less then "-" in Ascii  
> terms so a
> range search isn't going to work terrifically well.. Is there a  
> standard
> approach??

You could leave the "+" off for positive numbers.  That'd do the  
trick, right?  "-" < "0" lexicographically.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Range queries

Gwyn Carwardine
>> 2. How do I search for negative numbers in a range. For example  
>> field:[-3 TO
>> 2] ?
>>
>> I don't mind hacking code such that my numbers are indexed as  
>> +00000001 and
>> -00000001 and then I can override the query parser to change my  
>> query to
>> [-0000003 TO +0000002]. However.. "+" is less then "-" in Ascii  
>> terms so a
>> range search isn't going to work terrifically well.. Is there a  
>> standard
>> approach??
>
>You could leave the "+" off for positive numbers.  That'd do the  
>trick, right?  "-" < "0" lexicographically.
>
> Erik

Hi Erik,

good point! Of course it still doesn't work because "-3" is greater then
"-2" !

I've done something else where 0 is represented by 10000000000000000000
Any positive number has a "1" followed by the number left padded with zeroes
to length 19

12345 is therefore 10000000000000012345

Any negative number has a "0" followed by (number - long.minvalue) left
padded with zeroes to length 19

-12345 is therefore 09223372036854763463

This will work fine. Of course it got much more difficult with floating
numbers where you have to mess around with a positive and negative mantissa
and exponent. But it was doable.

-g


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Range queries

John Haxby-2
In reply to this post by Erik Hatcher
Erik Hatcher wrote:

>> 2. How do I search for negative numbers in a range. For example  
>> field:[-3 TO
>> 2] ?
>>
>> I don't mind hacking code such that my numbers are indexed as  
>> +00000001 and
>> -00000001 and then I can override the query parser to change my  
>> query to
>> [-0000003 TO +0000002]. However.. "+" is less then "-" in Ascii  
>> terms so a
>> range search isn't going to work terrifically well.. Is there a  
>> standard
>> approach??
>
>
> You could leave the "+" off for positive numbers.  That'd do the  
> trick, right?  "-" < "0" lexicographically.

As Gwyn pointed out, that would make -3 > -2.   Personally, I'd use
unsigned numbers and shift the range -- for 16 bit numbers I'd map
-32768..32767 to 0..65535 by adding 32768.  I guess you could do that by
having overriding getRangeQuery() (LIA, p207 -- wonderful book).

jch

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Range queries

Chris Hostetter-3

: As Gwyn pointed out, that would make -3 > -2.   Personally, I'd use
: unsigned numbers and shift the range -- for 16 bit numbers I'd map
: -32768..32767 to 0..65535 by adding 32768.  I guess you could do that by
: having overriding getRangeQuery() (LIA, p207 -- wonderful book).

there are a lot of different techniques for encoding numeric values as
lexigraphically ordered strings, finding the right soluation for any
given case depends mainly on what hte scope of your values is -- if
you're only ever dealing with the numbers 1-10, there are some really
easy options.  If you want something that can handle any "long" take a
look at the NumberTools class in SVN.  Even if what you want is something
that can handle any int, the technique used in that class can be still be
applied.


As for the query parsing aspect -- subclassing and overriding the
getRangeQuery method to know which fields to encode using your method of
choice is the very easy to do.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Range queries

Mike Streeton
In reply to this post by Gwyn Carwardine
I can recommend this method, this is how we do it, but what we store in
the index is the long converted to a 16 digit number hex. The extended
parser converts entered queries containing longs field to have hex. We
obviously also do the conversion before we display the value. Floating
point numbers are more difficult and so far I have used the same
technique to do fixed position floats.

Mike

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Chris Hostetter
Sent: 24 January 2006 22:37
To: [hidden email]
Subject: Re: Range queries


: As Gwyn pointed out, that would make -3 > -2.   Personally, I'd use
: unsigned numbers and shift the range -- for 16 bit numbers I'd map
: -32768..32767 to 0..65535 by adding 32768.  I guess you could do that
by
: having overriding getRangeQuery() (LIA, p207 -- wonderful book).

there are a lot of different techniques for encoding numeric values as
lexigraphically ordered strings, finding the right soluation for any
given case depends mainly on what hte scope of your values is -- if
you're only ever dealing with the numbers 1-10, there are some really
easy options.  If you want something that can handle any "long" take a
look at the NumberTools class in SVN.  Even if what you want is
something
that can handle any int, the technique used in that class can be still
be
applied.


As for the query parsing aspect -- subclassing and overriding the
getRangeQuery method to know which fields to encode using your method of
choice is the very easy to do.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Range queries

Mike Streeton
In reply to this post by Gwyn Carwardine
Sorry forgot to mention what you do for floats is take everything to the
left of decimal point encode this to 16 digit hex (via long) then append
of decimal point and everything following it. The only problem we tend
to find is searching across large ranges either produces an exception
about too many Boolean queries or does not return any results at all.

Mike

-----Original Message-----
From: Mike Streeton [mailto:[hidden email]]
Sent: 25 January 2006 11:28
To: [hidden email]
Subject: RE: Range queries

I can recommend this method, this is how we do it, but what we store in
the index is the long converted to a 16 digit number hex. The extended
parser converts entered queries containing longs field to have hex. We
obviously also do the conversion before we display the value. Floating
point numbers are more difficult and so far I have used the same
technique to do fixed position floats.

Mike

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Chris Hostetter
Sent: 24 January 2006 22:37
To: [hidden email]
Subject: Re: Range queries


: As Gwyn pointed out, that would make -3 > -2.   Personally, I'd use
: unsigned numbers and shift the range -- for 16 bit numbers I'd map
: -32768..32767 to 0..65535 by adding 32768.  I guess you could do that
by
: having overriding getRangeQuery() (LIA, p207 -- wonderful book).

there are a lot of different techniques for encoding numeric values as
lexigraphically ordered strings, finding the right soluation for any
given case depends mainly on what hte scope of your values is -- if
you're only ever dealing with the numbers 1-10, there are some really
easy options.  If you want something that can handle any "long" take a
look at the NumberTools class in SVN.  Even if what you want is
something
that can handle any int, the technique used in that class can be still
be
applied.


As for the query parsing aspect -- subclassing and overriding the
getRangeQuery method to know which fields to encode using your method of
choice is the very easy to do.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]