Is there a way to round data when index, but still able to return original content?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Is there a way to round data when index, but still able to return original content?

Jeffery Yuan
This post was updated on .
Hi, Solr developers and users:

I am wondering whether there is a way to round data when index, but still able to return original content?

For example, for a date field: 2012-12-21T12:12:12Z, because when search, user only cares date part, so I can round it to 2012-12-12T00:00:00Z, when index it - this can reduce index size, as there will be less term.

But user still wants to get the original content, so the result of matched doc will return 2012-12-21T12:12:12Z not 2012-12-12T00:00:00Z.

This also applies to number and text field.

Is there a way to do this in Solr?

Thanks for you reply :)

Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to round data when index, but still able to return original content?

Erick Erickson
Depends on whether the transformation is before or after the doc gets sent
to Solr. If you're changing the data before you give it to Solr, then you'd
have to have two fields, probably indexed=true and stored=false for the one
you search on, and indexed=false stored=true for the one you return to the
user.

This really doesn't take any more resources than using one field.....

If you put some sort of (perhaps custom) filter in place, then the original
value would go in as stored and the altered value would get in the index
and you could do both in the same field.

Best
Erick


On Sat, Dec 8, 2012 at 2:34 PM, jefferyyuan <[hidden email]> wrote:

> Hi:
>
> I am wondering whether there is a way to round data when index, but still
> able to return original content?
>
> For example, for a date field: 2012-12-21T12:12:12Z, because when search,
> user only cares date part, so I can round it to 2012-12-12T00:00:00Z, when
> index it - this can reduce index size, as there will be less term.
>
> But user still wants to get the original content, so the result of matched
> doc will return 2012-12-21T12:12:12Z not 2012-12-12T00:00:00Z.
>
> This also applies to number and text field.
>
> Is there a way to do this in Solr?
>
> Thanks for you reply :)
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-round-data-when-index-but-still-able-to-return-original-content-tp4025405.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to round data when index, but still able to return original content?

Jeffery Yuan
Erick, Thanks for your reply.

I know how to implement the solution 1.

But no idea how yo implement the solution 2 you mentioned:
===>
If you put some sort of (perhaps custom) filter in place, then the original
value would go in as stored and the altered value would get in the index
and you could do both in the same field.

Can you please describe more about how to store original data and index the altered value in the same filed?

Thanks :)



Reply | Threaded
Open this post in threaded view
|

RE: Is there a way to round data when index, but still able to return original content?

sswoboda
When you apply your analyzers/filters/tokenizers, the result value is kept in the indexed; however, the input value is actually stored. For example, from schema.xml file:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>    
</fieldType>

This particular field type will strip out the HTML. So if the input is:

<b>Hello</b>

It's being tokenized in the index as

Hello

It's being stored (and hence returned to you) as

<b>Hello</b>

So you can create your own charFilter or filter class which converts your date for the indexer, but the original data will "automatically" be stored.

I hope this makes sense.

-----Original Message-----
From: jefferyyuan [mailto:[hidden email]]
Sent: Monday, December 10, 2012 10:24 AM
To: [hidden email]
Subject: Re: Is there a way to round data when index, but still able to return original content?

Erick, Thanks for your reply.

I know how to implement the solution 1.

But no idea how yo implement the solution 2 you mentioned:
===>
If you put some sort of (perhaps custom) filter in place, then the original value would go in as stored and the altered value would get in the index and you could do both in the same field.

Can you please describe more about how to store original data and index the altered value in the same filed?

Thanks :)







--
View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-round-data-when-index-but-still-able-to-return-original-content-tp4025405p4025695.html
Sent from the Solr - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

RE: Is there a way to round data when index, but still able to return original content?

Jeffery Yuan
sswoboda, Thanks for your explanation, now I understand what is stored in Lucene now:)
Reply | Threaded
Open this post in threaded view
|

RE: Is there a way to round data when index, but still able to return original content?

Jeffery Yuan
In reply to this post by sswoboda
Sorry to ask a question again, but I want to round date(TireDate) and TrieLongField, seems they don't support configuring analyzer: charFilter , tokenizer or filter.

What I should do? Now I am thinking to write my custom date or long field, is there any other way? :)

Thanks :)
 
Reply | Threaded
Open this post in threaded view
|

RE: Is there a way to round data when index, but still able to return original content?

sswoboda
Hi,

Nope...they don't. Generally, I am not sure if I'd bother rounding this information to "reduce the index size." Have you determined how much index size space you'll actually be saving? I am not confident that it'd be worth your time; i.e. I'd just go with indexing/storing the time information as well.

Regardless, if you do want to go this route, the only way I can think of that wouldn't be a "complicated" solution is to have one field that is indexed/rounded (and not stored) and another field that is just stored (and not indexed).

Hope this helps.

-----Original Message-----
From: jefferyyuan [mailto:[hidden email]]
Sent: Monday, December 10, 2012 3:14 PM
To: [hidden email]
Subject: RE: Is there a way to round data when index, but still able to return original content?

Sorry to ask a question again, but I want to round date(TireDate) and TrieLongField, seems they don't support configuring analyzer: charFilter , tokenizer or filter.

What I should do? Now I am thinking to write my custom date or long field, is there any other way? :)

Thanks :)
 



--
View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-round-data-when-index-but-still-able-to-return-original-content-tp4025405p4025793.html
Sent from the Solr - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to round data when index, but still able to return original content?

Alexandre Rafalovitch
In reply to this post by Jeffery Yuan
Why are you trying to do that? Is that so you can have date/number buckets
for facets? The range queries may help with that.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



On Sun, Dec 9, 2012 at 9:34 AM, jefferyyuan <[hidden email]> wrote:

> Hi:
>
> I am wondering whether there is a way to round data when index, but still
> able to return original content?
>
> For example, for a date field: 2012-12-21T12:12:12Z, because when search,
> user only cares date part, so I can round it to 2012-12-12T00:00:00Z, when
> index it - this can reduce index size, as there will be less term.
>
> But user still wants to get the original content, so the result of matched
> doc will return 2012-12-21T12:12:12Z not 2012-12-12T00:00:00Z.
>
> This also applies to number and text field.
>
> Is there a way to do this in Solr?
>
> Thanks for you reply :)
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-round-data-when-index-but-still-able-to-return-original-content-tp4025405.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

RE: Is there a way to round data when index, but still able to return original content?

Jeffery Yuan
In reply to this post by sswoboda
Thanks for your reply.

Now for date field, as we only care about date field, so we round the date before index it - using a custom Updateprocesser or a custom field type.

For a number field which stores file size, we create another field, sizeRounded, which is indexed but not stored, so normal search or facet search on this field will be faster.