Lucene 6: Recommended way to store numeric values, given the need to form term vocabulary?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Lucene 6: Recommended way to store numeric values, given the need to form term vocabulary?

Maros Urbanec
Lucene beginner here, please excuse me if I’m asking anything obvious.

In Lucene 6, LongField and IntField were renamed to LegacyLongField and LegacyIntField, deprecated with a JavaDoc suggestion to use LongPoint and IntPoint classes instead.

However, it seems impossible to build a term vocabulary (=enumerate all distinct values) of these XPoint fields.

As a third option, one can add a field of class NumericDocValuesField. I tried hardtop search through documentation, alas found no way of building term vocabulary either.

Is there a non-deprecated way of indexing a numeric field in Lucene 6, given the requirement to build a term vocabulary?
Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lucene 6: Recommended way to store numeric values, given the need to form term vocabulary?

Adrien Grand
Hi Maros,

Do you need to perform range queries? If not, you could index those numbers
like regular strings with StringField.

If yes, it is also possible with points by returning INTERSECTS all the
time in the intersect visitor, the downside you might not like is that it
is a push API while the TermsEnum API you were used to was a pull API.

Le mer. 1 mars 2017 à 16:11, Maros Urbanec <[hidden email]> a
écrit :

> Lucene beginner here, please excuse me if I’m asking anything obvious.
>
> In Lucene 6, LongField and IntField were renamed to LegacyLongField and
> LegacyIntField, deprecated with a JavaDoc suggestion to use LongPoint and
> IntPoint classes instead.
>
> However, it seems impossible to build a term vocabulary (=enumerate all
> distinct values) of these XPoint fields.
>
> As a third option, one can add a field of class NumericDocValuesField. I
> tried hardtop search through documentation, alas found no way of building
> term vocabulary either.
>
> Is there a non-deprecated way of indexing a numeric field in Lucene 6,
> given the requirement to build a term vocabulary?
> Disclaimer: This message and any attachments thereto are intended solely
> for the addressed recipient(s) and may contain confidential information. If
> you are not the intended recipient, please notify the sender by reply
> e-mail and delete the e-mail (including any attachments thereto) without
> producing, distributing or retaining any copies thereof. Any review,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended
> recipient(s) is prohibited. Thank you.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Lucene 6: Recommended way to store numeric values, given the need to form term vocabulary?

Maros Urbanec
Thanks for the feedback.

PointValues.intersect() seems to go through all the documents in my corpus. Are LongPoints and NumericDocFields stored such that there is no way to get distinct values, other than scanning through every single document? It'd be completely fine with me, just that it is not obvious from the Javadocs.

A side question, has been lingering in my head - many PointValues methods return byte[]. Is there any public API for decoding the values?

> Hi Maros,

> Do you need to perform range queries? If not, you could index those numbers like regular strings with StringField.

> If yes, it is also possible with points by returning INTERSECTS all the time in the intersect visitor, the downside you might not like is that it is a push API while the TermsEnum API > you were used to was a pull API.

Le mer. 1 mars 2017 à 16:11, Maros Urbanec <[hidden email]> a écrit :

>> Lucene beginner here, please excuse me if I’m asking anything obvious.
>>
>> In Lucene 6, LongField and IntField were renamed to LegacyLongField
>> and LegacyIntField, deprecated with a JavaDoc suggestion to use
>> LongPoint and IntPoint classes instead.
>>
>> However, it seems impossible to build a term vocabulary (=enumerate
>> all distinct values) of these XPoint fields.
>>
>> As a third option, one can add a field of class NumericDocValuesField.
>> I tried hardtop search through documentation, alas found no way of
>> building term vocabulary either.
>>
>> Is there a non-deprecated way of indexing a numeric field in Lucene 6,
>> given the requirement to build a term vocabulary?
>> Disclaimer: This message and any attachments thereto are intended
>> solely for the addressed recipient(s) and may contain confidential
>> information. If you are not the intended recipient, please notify the
>> sender by reply e-mail and delete the e-mail (including any
>> attachments thereto) without producing, distributing or retaining any
>> copies thereof. Any review, dissemination or other use of, or taking
>> of any action in reliance upon, this information by persons or
>> entities other than the intended
>> recipient(s) is prohibited. Thank you.
>>
Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notify the sender by reply e-mail and delete the e-mail (including any attachments thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...