Count terms for IntPoint field

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Count terms for IntPoint field

Riccardo Tasso
Hello,
 I'm porting an application from lucene 4 to lucene 7.

I've converted a field from IntField to IntPoint and at query or indexing
time everything is ok.

When I call the method:

reader.getSumTotalTermFreq(field);

it returns zero for my IntPoint field. I understand that IntPoint is stored
in specific data structure (the block k-d tree), but how could I obtain the
same result as in the previous version?

Which is the best way to count the "number of terms" also for IntPoint?

Can I also find the equivalent of "top terms", i.e. the list of more
frequent values for a given field with their count?

It would be the same if I will use the NumericDocValuesField?

Thanks,
 Riccardo
Reply | Threaded
Open this post in threaded view
|

Re: Count terms for IntPoint field

Adrien Grand
You probably want to look at PointValues.size(), which gives you the number
of indexed points. Doc values do not support index statistics however.

Le mer. 28 févr. 2018 à 21:47, Riccardo Tasso <[hidden email]> a
écrit :

> Hello,
>  I'm porting an application from lucene 4 to lucene 7.
>
> I've converted a field from IntField to IntPoint and at query or indexing
> time everything is ok.
>
> When I call the method:
>
> reader.getSumTotalTermFreq(field);
>
> it returns zero for my IntPoint field. I understand that IntPoint is stored
> in specific data structure (the block k-d tree), but how could I obtain the
> same result as in the previous version?
>
> Which is the best way to count the "number of terms" also for IntPoint?
>
> Can I also find the equivalent of "top terms", i.e. the list of more
> frequent values for a given field with their count?
>
> It would be the same if I will use the NumericDocValuesField?
>
> Thanks,
>  Riccardo
>
Reply | Threaded
Open this post in threaded view
|

Re: Count terms for IntPoint field

Riccardo Tasso
Thanks, probably for DocValues I can use DocValuesStatsCollector
and DocValuesStats.

2018-03-01 2:13 GMT+01:00 Adrien Grand <[hidden email]>:

> You probably want to look at PointValues.size(), which gives you the number
> of indexed points. Doc values do not support index statistics however.
>
> Le mer. 28 févr. 2018 à 21:47, Riccardo Tasso <[hidden email]> a
> écrit :
>
> > Hello,
> >  I'm porting an application from lucene 4 to lucene 7.
> >
> > I've converted a field from IntField to IntPoint and at query or indexing
> > time everything is ok.
> >
> > When I call the method:
> >
> > reader.getSumTotalTermFreq(field);
> >
> > it returns zero for my IntPoint field. I understand that IntPoint is
> stored
> > in specific data structure (the block k-d tree), but how could I obtain
> the
> > same result as in the previous version?
> >
> > Which is the best way to count the "number of terms" also for IntPoint?
> >
> > Can I also find the equivalent of "top terms", i.e. the list of more
> > frequent values for a given field with their count?
> >
> > It would be the same if I will use the NumericDocValuesField?
> >
> > Thanks,
> >  Riccardo
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Count terms for IntPoint field

Riccardo Tasso
Ok, I've studied the documentation.

First of all what I needed for most of my fields (StringField, TextField)
is the:

MultiFields.getTerms(reader, field.name).size();

which counts the distinct terms for the field.

For PointFields the hint was right: PointValues.size() is what i need.

For DocValues my question doesn't make sense, since there is no inverted
index for those fields.

Another edge case is the stored only field. Also for this one I think no
count could be provided by lucene.

Riccardo

2018-03-01 19:52 GMT+01:00 Riccardo Tasso <[hidden email]>:

> Thanks, probably for DocValues I can use DocValuesStatsCollector
> and DocValuesStats.
>
> 2018-03-01 2:13 GMT+01:00 Adrien Grand <[hidden email]>:
>
>> You probably want to look at PointValues.size(), which gives you the
>> number
>> of indexed points. Doc values do not support index statistics however.
>>
>> Le mer. 28 févr. 2018 à 21:47, Riccardo Tasso <[hidden email]>
>> a
>> écrit :
>>
>> > Hello,
>> >  I'm porting an application from lucene 4 to lucene 7.
>> >
>> > I've converted a field from IntField to IntPoint and at query or
>> indexing
>> > time everything is ok.
>> >
>> > When I call the method:
>> >
>> > reader.getSumTotalTermFreq(field);
>> >
>> > it returns zero for my IntPoint field. I understand that IntPoint is
>> stored
>> > in specific data structure (the block k-d tree), but how could I obtain
>> the
>> > same result as in the previous version?
>> >
>> > Which is the best way to count the "number of terms" also for IntPoint?
>> >
>> > Can I also find the equivalent of "top terms", i.e. the list of more
>> > frequent values for a given field with their count?
>> >
>> > It would be the same if I will use the NumericDocValuesField?
>> >
>> > Thanks,
>> >  Riccardo
>> >
>>
>
>