What is the benefit of stored="true" in *PointFields

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

What is the benefit of stored="true" in *PointFields

Yasufumi Mizoguchi
Hi,

I am using Solr 7.6 and want to reduce index size due to hardware
limitation.
I already tried to
 1. set false to unnecessary field's indexed/stored/docValues parameter in
schema.
 2. set compressionMode="BEST_COMPRESSION" in solrconfig.

These were quite good, but I still need to reduce index size.

Then, I am now planning to set stored="false" in *PointFields only used for
range query,
faceting and sorting. Because I think that docValues="true" is enough to
acquire field's
value thanks to useDocValuesAsStored parameter.

But I also think this might lead to bad query performance...

So, is there any good suggestions about the stored and docValues settings
around
*PointFields?

Thanks,
Yasufumi
Reply | Threaded
Open this post in threaded view
|

Re: What is the benefit of stored="true" in *PointFields

Shawn Heisey-2
On 2/6/2019 12:42 AM, Yasufumi Mizoguchi wrote:

> I am using Solr 7.6 and want to reduce index size due to hardware
> limitation.
> I already tried to
>   1. set false to unnecessary field's indexed/stored/docValues parameter in
> schema.
>   2. set compressionMode="BEST_COMPRESSION" in solrconfig.
>
> These were quite good, but I still need to reduce index size.
>
> Then, I am now planning to set stored="false" in *PointFields only used for
> range query,
> faceting and sorting. Because I think that docValues="true" is enough to
> acquire field's
> value thanks to useDocValuesAsStored parameter.
>
> But I also think this might lead to bad query performance...

Stored values have pretty much zero bearing on query performance.

Stored is smaller than docValues -- it's compressed, and docValues aren't.

If you do not need docValues for some other aspect, like faceting or
sorting, then choose stored.  If you need docValues for something, then
choose docValues.

Removing either docValues or stored on a numeric type is probably not
going to make much difference in the total size of the index unless
there are billions of documents.

On a point type, queries like "field:333" will be slow.  This is the
nature of a point type.  If you will frequently make queries for
individual values, the Trie types (deprecated, will be removed in 8.0)
are better.  Range queries (like "field:[444 TO 555]") perform best on a
point type.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: What is the benefit of stored="true" in *PointFields

Yasufumi Mizoguchi
Hi, Shawn.

Thank you for replying me.

> Stored is smaller than docValues -- it's compressed, and docValues aren't.
Actually, stored is compressed but I believed that docValues was compressed
in some strategies depending on
field's values/density as following java doc says.
https://lucene.apache.org/core/7_6_0/core/org/apache/lucene/codecs/lucene70/Lucene70DocValuesFormat.html

> Removing either docValues or stored on a numeric type is probably not
> going to make much difference in the total size of the index unless
> there are billions of documents.

Yes, I tried stored="false" on some numeric fields, but it was not good.
So, I am trying to set stored="false" on some string fields...

Thank you for your advice,
Yasufumi.



2019年2月7日(木) 0:48 Shawn Heisey <[hidden email]>:

> On 2/6/2019 12:42 AM, Yasufumi Mizoguchi wrote:
> > I am using Solr 7.6 and want to reduce index size due to hardware
> > limitation.
> > I already tried to
> >   1. set false to unnecessary field's indexed/stored/docValues parameter
> in
> > schema.
> >   2. set compressionMode="BEST_COMPRESSION" in solrconfig.
> >
> > These were quite good, but I still need to reduce index size.
> >
> > Then, I am now planning to set stored="false" in *PointFields only used
> for
> > range query,
> > faceting and sorting. Because I think that docValues="true" is enough to
> > acquire field's
> > value thanks to useDocValuesAsStored parameter.
> >
> > But I also think this might lead to bad query performance...
>
> Stored values have pretty much zero bearing on query performance.
>
> Stored is smaller than docValues -- it's compressed, and docValues aren't.
>
> If you do not need docValues for some other aspect, like faceting or
> sorting, then choose stored.  If you need docValues for something, then
> choose docValues.
>
> Removing either docValues or stored on a numeric type is probably not
> going to make much difference in the total size of the index unless
> there are billions of documents.
>
> On a point type, queries like "field:333" will be slow.  This is the
> nature of a point type.  If you will frequently make queries for
> individual values, the Trie types (deprecated, will be removed in 8.0)
> are better.  Range queries (like "field:[444 TO 555]") perform best on a
> point type.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: What is the benefit of stored="true" in *PointFields

Toke Eskildsen-2
On Thu, 2019-02-07 at 11:24 +0900, Yasufumi Mizoguchi wrote:
> Actually, stored is compressed but I believed that docValues was
> compressed
> in some strategies depending on
> field's values/density as following java doc says.
>
https://lucene.apache.org/core/7_6_0/core/org/apache/lucene/codecs/lucene70/Lucene70DocValuesFormat.html

In scenarios with low diversity in Strings (city names for example),
DocValues de-duplication can work very well. It is hard to generally
compare the size of stored vs. doc values as the strategies are very
different and the relative difference is highly dependent on content.

As for query performance, Shawn is technically correct that there will
be no impact on query performance (as long as you don't use
indexed=false, docvalues=true). But it does influence document
retrieval time. Under most circumstances the difference will be small,
but if you retrieve a large number of documents or your corpus is large
(measured in documents), it can be significant:


https://lucene.apache.org/solr/guide/7_6/docvalues.html#retrieving-docvalues-during-search

Specifically, the Solr 7 series has poor random access (used for
document retrieval) doc values performance for indexes with many
documents.

- Toke Eskildsen, royal Danish Library