Using RangeFilter

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Using RangeFilter

vivek sar
Hi,

 I have a requirement to filter out documents by date range. I'm using
RangeFilter (in combination to FilteredQuery) to do this. I was under
the impression the filtering is done on documents, thus I'm just
storing the date values, but not indexing them. As every new document
would have a new date value indexing each date value field for every
new document would be very expensive. We index pretty much over 10K
new documents every minute, so I want to minimize the number of values
I need to index.

This is what I want to do,

  doc.add(new Field("optime", getDateStr(rs.getDate("optime")),
                                        Field.Store.YES, Field.Index.NO));

 When I do this I always get 0 hits, but if I turn on indexing for
date (Field.Index.NO_NORM) then I'm getting the right result. But,
turning indexing on date field doubles my index size.

 My date format: 20080200000000 (yyyyMMddhhmiss)

Is there any way I can filter on date value without indexing them?

Thanks,
-vivek

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Using RangeFilter

Otis Gospodnetic-2
Hi,

Do you really need to store those dates?  Why not just index them and not store them if index size is a concern?

Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: vivek sar <[hidden email]>
To: [hidden email]
Sent: Saturday, January 19, 2008 8:06:25 PM
Subject: Using RangeFilter

Hi,

 I have a requirement to filter out documents by date range. I'm using
RangeFilter (in combination to FilteredQuery) to do this. I was under
the impression the filtering is done on documents, thus I'm just
storing the date values, but not indexing them. As every new document
would have a new date value indexing each date value field for every
new document would be very expensive. We index pretty much over 10K
new documents every minute, so I want to minimize the number of values
I need to index.

This is what I want to do,

  doc.add(new Field("optime", getDateStr(rs.getDate("optime")),
                    Field.Store.YES, Field.Index.NO));

 When I do this I always get 0 hits, but if I turn on indexing for
date (Field.Index.NO_NORM) then I'm getting the right result. But,
turning indexing on date field doubles my index size.

 My date format: 20080200000000 (yyyyMMddhhmiss)

Is there any way I can filter on date value without indexing them?

Thanks,
-vivek

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Using RangeFilter

vivek sar
I need to be able to sort on optime as well, thus need to store it .
Isn't there any way to filter without indexing? Not sure why do I need
to index some field I need to filter on. I thought we could get all
the documents from an index and then filter out the documents from it
for a field within a given range - isn't this possible?

Just a side question, what takes more space - indexing a field value
or storing a field value?

Thanks,
-vivek

On Jan 19, 2008 7:20 PM, Otis Gospodnetic <[hidden email]> wrote:

> Hi,
>
> Do you really need to store those dates?  Why not just index them and not store them if index size is a concern?
>
> Otis
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> ----- Original Message ----
> From: vivek sar <[hidden email]>
> To: [hidden email]
> Sent: Saturday, January 19, 2008 8:06:25 PM
> Subject: Using RangeFilter
>
> Hi,
>
>  I have a requirement to filter out documents by date range. I'm using
> RangeFilter (in combination to FilteredQuery) to do this. I was under
> the impression the filtering is done on documents, thus I'm just
> storing the date values, but not indexing them. As every new document
> would have a new date value indexing each date value field for every
> new document would be very expensive. We index pretty much over 10K
> new documents every minute, so I want to minimize the number of values
> I need to index.
>
> This is what I want to do,
>
>   doc.add(new Field("optime", getDateStr(rs.getDate("optime")),
>                     Field.Store.YES, Field.Index.NO));
>
>  When I do this I always get 0 hits, but if I turn on indexing for
> date (Field.Index.NO_NORM) then I'm getting the right result. But,
> turning indexing on date field doubles my index size.
>
>  My date format: 20080200000000 (yyyyMMddhhmiss)
>
> Is there any way I can filter on date value without indexing them?
>
> Thanks,
> -vivek
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Using RangeFilter

Shai Erera
You can try to write your own HitCollector and on its collect(int doc, float
score) method read the doc's date value and decide if it passes the filter
or not.
That is an approach I use for similar tasks.

On Jan 20, 2008 6:51 AM, vivek sar <[hidden email]> wrote:

> I need to be able to sort on optime as well, thus need to store it .
> Isn't there any way to filter without indexing? Not sure why do I need
> to index some field I need to filter on. I thought we could get all
> the documents from an index and then filter out the documents from it
> for a field within a given range - isn't this possible?
>
> Just a side question, what takes more space - indexing a field value
> or storing a field value?
>
> Thanks,
> -vivek
>
> On Jan 19, 2008 7:20 PM, Otis Gospodnetic <[hidden email]>
> wrote:
> > Hi,
> >
> > Do you really need to store those dates?  Why not just index them and
> not store them if index size is a concern?
> >
> > Otis
> >
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> > ----- Original Message ----
> > From: vivek sar <[hidden email]>
> > To: [hidden email]
> > Sent: Saturday, January 19, 2008 8:06:25 PM
> > Subject: Using RangeFilter
> >
> > Hi,
> >
> >  I have a requirement to filter out documents by date range. I'm using
> > RangeFilter (in combination to FilteredQuery) to do this. I was under
> > the impression the filtering is done on documents, thus I'm just
> > storing the date values, but not indexing them. As every new document
> > would have a new date value indexing each date value field for every
> > new document would be very expensive. We index pretty much over 10K
> > new documents every minute, so I want to minimize the number of values
> > I need to index.
> >
> > This is what I want to do,
> >
> >   doc.add(new Field("optime", getDateStr(rs.getDate("optime")),
> >                     Field.Store.YES, Field.Index.NO));
> >
> >  When I do this I always get 0 hits, but if I turn on indexing for
> > date (Field.Index.NO_NORM) then I'm getting the right result. But,
> > turning indexing on date field doubles my index size.
> >
> >  My date format: 20080200000000 (yyyyMMddhhmiss)
> >
> > Is there any way I can filter on date value without indexing them?
> >
> > Thanks,
> > -vivek
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Regards,

Shai Erera
adb
Reply | Threaded
Open this post in threaded view
|

Re: Using RangeFilter

adb
In reply to this post by vivek sar
vivek sar wrote:
> I need to be able to sort on optime as well, thus need to store it .

Lucene's default sorting does not need the field to be stored, only indexed as
untokenized.
Antony




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Using RangeFilter

vivek sar
I've a field as NO_NORM, does it has to be untokenized to be able to
sort on it?


On Jan 21, 2008 12:47 PM, Antony Bowesman <[hidden email]> wrote:

> vivek sar wrote:
> > I need to be able to sort on optime as well, thus need to store it .
>
> Lucene's default sorting does not need the field to be stored, only indexed as
> untokenized.
> Antony
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

adb
Reply | Threaded
Open this post in threaded view
|

Re: Using RangeFilter

adb
vivek sar wrote:
> I've a field as NO_NORM, does it has to be untokenized to be able to
> sort on it?

NO_NORMS is the same as UNTOKENIZED + omitNorms, so you can sort on that.
Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]