Solr data type for date faceting

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr data type for date faceting

Karthik K-2
I have a field storing timestamp as text (YYYYMMDDHHMM). Can i get the
results as i get with date faceting? (July(30),August(54) etc)
As per my knowledge Solr currently doesn't support range faceting, even if
it does in the future , text will not be recognized as integer/long.

Tried for a workaround with field.prefix, but it cannot give the desired
result.

Thanks,
Karthik
Reply | Threaded
Open this post in threaded view
|

Re: Solr data type for date faceting

Mark Allan
If you're storing the timestamp as YYYYMMDDHHMM, why don't you make it  
a trie-coded integer field (type 'tint') rather than text?  That way,  
I believe range queries would be more efficient.  You can then do a  
facet query, specifying your desired ranges as one facet query for  
each range.

Note that I think you can also do facet queries with text fields, but  
in this instance, storing it as a number would probably be more  
efficient.  Your user interface can deal with translating it from  
YYYYMMDDHHMM to something more display-appropriate.

Mark

On 18 Aug 2010, at 9:28 am, Karthik K wrote:

> I have a field storing timestamp as text (YYYYMMDDHHMM). Can i get the
> results as i get with date faceting? (July(30),August(54) etc)
> As per my knowledge Solr currently doesn't support range faceting,  
> even if
> it does in the future , text will not be recognized as integer/long.
>
> Tried for a workaround with field.prefix, but it cannot give the  
> desired
> result.
>
> Thanks,
> Karthik


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Reply | Threaded
Open this post in threaded view
|

Re: Solr data type for date faceting

Karthik K-2
Thanks Mark. Yeah, storing it as 'tint' would be quite efficient.As i cannot
re-index the massive data, please let me know if the changes i make in
schema reflect to the already indexed data? I am not  sure how type checking
happens in solr.

You can then do a facet query, specifying your desired ranges as one facet
query for each range.

http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range says its
not implemented yet.

Please let me know if there is any workaround in my case.

Thanks,
Karthik
Reply | Threaded
Open this post in threaded view
|

Re: Solr data type for date faceting

Jan Høydahl / Cominvent
If you want to change the schema on the live index, make sure you do a compatible change, as Solr does not do any type checking or schema change validation.

I would ADD a field with another name for the tint field.
Unfortunately you have to re-index to have an index built on this field.
May I suggest that you start re-feeding a portion of the index every day until finished. Use large batches between each commit(), and make sure to run an optimize every copule of days to get rid of "dead meat".

If you simply do not have the orignial data for refeed, perhaps it is possible to extract all string values offline from the index and somehow rebuild the index offline?

Andrzej, is that possible?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 18. aug. 2010, at 11.12, Karthik K wrote:

> Thanks Mark. Yeah, storing it as 'tint' would be quite efficient.As i cannot
> re-index the massive data, please let me know if the changes i make in
> schema reflect to the already indexed data? I am not  sure how type checking
> happens in solr.
>
> You can then do a facet query, specifying your desired ranges as one facet
> query for each range.
>
> http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range says its
> not implemented yet.
>
> Please let me know if there is any workaround in my case.
>
> Thanks,
> Karthik

Reply | Threaded
Open this post in threaded view
|

Re: Solr data type for date faceting

Karthik K-2
adding facet.query=timestamp:[201006010000+TO+201006312359]&facet.query=timestamp:[201007010000+TO+201007312359]...
in query should give the desired response without changing the schema or
re-indexing.
Reply | Threaded
Open this post in threaded view
|

Re: Solr data type for date faceting

Jan Høydahl / Cominvent
Yes, I forgot that strings support alphanumeric ranges.
However, they will potentially be very memory intensive since you dont get the trie-optimization and since strings take up more space than ints. Only way is to try it out.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 19. aug. 2010, at 05.20, Karthik K wrote:

> adding facet.query=timestamp:[201006010000+TO+201006312359]&facet.query=timestamp:[201007010000+TO+201007312359]...
> in query should give the desired response without changing the schema or
> re-indexing.