Range queries in Lucene - numerical or lexicographical

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Range queries in Lucene - numerical or lexicographical

Nilesh Bansal
Hi all,

Lucene query parser synax page
(http://lucene.apache.org/java/docs/queryparsersyntax.html) provides
the following two examples of range query:
mod_date:[20020101 TO 20030101]
and
title:{Aida TO Carmen}

Now my question is, numerically 10 is greater than 2, but in
string-only comparison 2 is greater than 10. So if I search for
field:[10 TO 30]
will a document with field=2 will be in result or not.

And if I search for a string field,
field:[AA TO CC]
will document with field="B" will be in result or not.

The semantics of range is not clear (numerical or lexicographical)
from the documentation.

thanks
Nilesh

--
Nilesh Bansal.
http://queens.db.toronto.edu/~nilesh/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Range queries in Lucene - numerical or lexicographical

Erick Erickson
As has been discussed several times, Lucene is a string-only engine, and
has no native understanding of numerical values. You have to normalize
them for string searches. See NumberTools.

Best
Erick

On 8/11/07, Nilesh Bansal <[hidden email]> wrote:

>
> Hi all,
>
> Lucene query parser synax page
> (http://lucene.apache.org/java/docs/queryparsersyntax.html) provides
> the following two examples of range query:
> mod_date:[20020101 TO 20030101]
> and
> title:{Aida TO Carmen}
>
> Now my question is, numerically 10 is greater than 2, but in
> string-only comparison 2 is greater than 10. So if I search for
> field:[10 TO 30]
> will a document with field=2 will be in result or not.
>
> And if I search for a string field,
> field:[AA TO CC]
> will document with field="B" will be in result or not.
>
> The semantics of range is not clear (numerical or lexicographical)
> from the documentation.
>
> thanks
> Nilesh
>
> --
> Nilesh Bansal.
> http://queens.db.toronto.edu/~nilesh/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Range queries in Lucene - numerical or lexicographical

Nilesh Bansal
Thanks. Probably this should be mentioned on the documentation page.

-Nilesh

On 8/12/07, Erick Erickson <[hidden email]> wrote:

> As has been discussed several times, Lucene is a string-only engine, and
> has no native understanding of numerical values. You have to normalize
> them for string searches. See NumberTools.
>
> Best
> Erick
>
> On 8/11/07, Nilesh Bansal <[hidden email]> wrote:
> >
> > Hi all,
> >
> > Lucene query parser synax page
> > (http://lucene.apache.org/java/docs/queryparsersyntax.html) provides
> > the following two examples of range query:
> > mod_date:[20020101 TO 20030101]
> > and
> > title:{Aida TO Carmen}
> >
> > Now my question is, numerically 10 is greater than 2, but in
> > string-only comparison 2 is greater than 10. So if I search for
> > field:[10 TO 30]
> > will a document with field=2 will be in result or not.
> >
> > And if I search for a string field,
> > field:[AA TO CC]
> > will document with field="B" will be in result or not.
> >
> > The semantics of range is not clear (numerical or lexicographical)
> > from the documentation.
> >
> > thanks
> > Nilesh
> >
> > --
> > Nilesh Bansal.
> > http://queens.db.toronto.edu/~nilesh/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>


--
Nilesh Bansal.
http://queens.db.toronto.edu/~nilesh/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Range queries in Lucene - numerical or lexicographical

is_maximum
Thanks Erick but unfortunately NumberTools works only with long primitive
type I am wondering why you didn't put some method for double and float.



On 8/13/07, Nilesh Bansal <[hidden email]> wrote:

>
> Thanks. Probably this should be mentioned on the documentation page.
>
> -Nilesh
>
> On 8/12/07, Erick Erickson <[hidden email]> wrote:
> > As has been discussed several times, Lucene is a string-only engine, and
> > has no native understanding of numerical values. You have to normalize
> > them for string searches. See NumberTools.
> >
> > Best
> > Erick
> >
> > On 8/11/07, Nilesh Bansal <[hidden email]> wrote:
> > >
> > > Hi all,
> > >
> > > Lucene query parser synax page
> > > (http://lucene.apache.org/java/docs/queryparsersyntax.html) provides
> > > the following two examples of range query:
> > > mod_date:[20020101 TO 20030101]
> > > and
> > > title:{Aida TO Carmen}
> > >
> > > Now my question is, numerically 10 is greater than 2, but in
> > > string-only comparison 2 is greater than 10. So if I search for
> > > field:[10 TO 30]
> > > will a document with field=2 will be in result or not.
> > >
> > > And if I search for a string field,
> > > field:[AA TO CC]
> > > will document with field="B" will be in result or not.
> > >
> > > The semantics of range is not clear (numerical or lexicographical)
> > > from the documentation.
> > >
> > > thanks
> > > Nilesh
> > >
> > > --
> > > Nilesh Bansal.
> > > http://queens.db.toronto.edu/~nilesh/
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
>
>
> --
> Nilesh Bansal.
> http://queens.db.toronto.edu/~nilesh/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/
--
Regards
Mohammad
Pixelshot
Reply | Threaded
Open this post in threaded view
|

Re: Range queries in Lucene - numerical or lexicographical

hossman
In reply to this post by Nilesh Bansal

: Subject: Re: Range queries in Lucene - numerical or lexicographical
:
: Thanks. Probably this should be mentioned on the documentation page.

it does say right above the "date" example: "....  Sorting is done
lexicographically."

(Admitedly I'm not sure why the word "Sorting" is used in that sentence,
but it should make it clear that it's a lexicographical comparison)

patches to improve documentation are always appreciated!




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Range queries in Lucene - numerical or lexicographical

Erick Erickson
In reply to this post by is_maximum
Uhhhh, because I didn't write the code? You can always contribute a patch.

On 8/13/07, Mohammad Norouzi <[hidden email]> wrote:

>
> Thanks Erick but unfortunately NumberTools works only with long primitive
> type I am wondering why you didn't put some method for double and float.
>
>
>
> On 8/13/07, Nilesh Bansal <[hidden email]> wrote:
> >
> > Thanks. Probably this should be mentioned on the documentation page.
> >
> > -Nilesh
> >
> > On 8/12/07, Erick Erickson <[hidden email]> wrote:
> > > As has been discussed several times, Lucene is a string-only engine,
> and
> > > has no native understanding of numerical values. You have to normalize
> > > them for string searches. See NumberTools.
> > >
> > > Best
> > > Erick
> > >
> > > On 8/11/07, Nilesh Bansal <[hidden email]> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Lucene query parser synax page
> > > > (http://lucene.apache.org/java/docs/queryparsersyntax.html) provides
> > > > the following two examples of range query:
> > > > mod_date:[20020101 TO 20030101]
> > > > and
> > > > title:{Aida TO Carmen}
> > > >
> > > > Now my question is, numerically 10 is greater than 2, but in
> > > > string-only comparison 2 is greater than 10. So if I search for
> > > > field:[10 TO 30]
> > > > will a document with field=2 will be in result or not.
> > > >
> > > > And if I search for a string field,
> > > > field:[AA TO CC]
> > > > will document with field="B" will be in result or not.
> > > >
> > > > The semantics of range is not clear (numerical or lexicographical)
> > > > from the documentation.
> > > >
> > > > thanks
> > > > Nilesh
> > > >
> > > > --
> > > > Nilesh Bansal.
> > > > http://queens.db.toronto.edu/~nilesh/
> > > >
> > > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [hidden email]
> > > > For additional commands, e-mail: [hidden email]
> > > >
> > > >
> > >
> >
> >
> > --
> > Nilesh Bansal.
> > http://queens.db.toronto.edu/~nilesh/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
> --
> Regards,
> Mohammad
> --------------------------
> see my blog: http://brainable.blogspot.com/
> another in Persian: http://fekre-motefavet.blogspot.com/
>