Optimize FTS memory footprint

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimize FTS memory footprint

elirev
Hi
I am using Elasticserach 1.7.5 , our segment memory allocation per node is
very big , its seems like related to FST .

1) Amy way to reduce /optimze its size ( i understed its the index for the
terms)  ?
2) Did index optimize can help ?
2) The fact that we used nested objects can dramticly  the allocation of
memory for it ?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Optimize FTS memory footprint

Michael McCandless-2
Are you sure its FSTs using your heap?

Do you have many index fields that have high cardinality?  Or many
suggesters?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Nov 16, 2017 at 5:03 PM, Eli Revach <[hidden email]> wrote:

> Hi
> I am using Elasticserach 1.7.5 , our segment memory allocation per node is
> very big , its seems like related to FST .
>
> 1) Amy way to reduce /optimze its size ( i understed its the index for the
> terms)  ?
> 2) Did index optimize can help ?
> 2) The fact that we used nested objects can dramticly  the allocation of
> memory for it ?
>
> Thanks
>
Reply | Threaded
Open this post in threaded view
|

Re: Optimize FTS memory footprint

elirev
Thanks   Mike .
I did not  find  any  clear  way to know it its FST or Norm , or something
else ( unless i miss something )  the fact the FST is an in memory prefix
index lead me to think it using most of the heap   .
Our  mapping is normal with around of 200 columns one of the columns is
nested object with limited amount of objects (up to 4 instances  )   , we
are using monthly base indexes  (keep 6 month open ) . In last month  i see
dramatic extra  allocation on the segment memory (around 30% where in
regulare    month is around 5%)  , the only change i see is that the nested
object is now include avg 8 instances  )  , this increases the amount of
the hidden document we have now on the  index (about more then twice) .
When we optimize the index the amount of allocation memory was reduced (we
see it only after rolling restart the nodes )   .

If you don't mind  i have few question :
1) Do you know about an  way  to figure   out which component is taking all
this memory .
2) Do you see relation between the fact that the nested objects was
increases to the extra memory allocation we have ?
3) Did FST memory usage is  impacted by the fact we optimize the problematic
index  and why  we see it only after restarting ES service

Thanks mike

.






--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimize FTS memory footprint

ybtsdst
Thanks, mike. I'm facing a similar problem.
I'm running a 2.0 elasticsearch cluster, and find the fst of _uid field
takes a lot of memory. The _uid field is not analyzed and generated by
elasticsearch, which also has high cardinality.
Is there any ways to  reduce memory cost for _uid field? Thanks.


2017-11-29 5:47 GMT+08:00 elirev <[hidden email]>:

> Thanks   Mike .
> I did not  find  any  clear  way to know it its FST or Norm , or something
> else ( unless i miss something )  the fact the FST is an in memory prefix
> index lead me to think it using most of the heap   .
> Our  mapping is normal with around of 200 columns one of the columns is
> nested object with limited amount of objects (up to 4 instances  )   , we
> are using monthly base indexes  (keep 6 month open ) . In last month  i see
> dramatic extra  allocation on the segment memory (around 30% where in
> regulare    month is around 5%)  , the only change i see is that the nested
> object is now include avg 8 instances  )  , this increases the amount of
> the hidden document we have now on the  index (about more then twice) .
> When we optimize the index the amount of allocation memory was reduced (we
> see it only after rolling restart the nodes )   .
>
> If you don't mind  i have few question :
> 1) Do you know about an  way  to figure   out which component is taking all
> this memory .
> 2) Do you see relation between the fact that the nested objects was
> increases to the extra memory allocation we have ?
> 3) Did FST memory usage is  impacted by the fact we optimize the
> problematic
> index  and why  we see it only after restarting ES service
>
> Thanks mike
>
> .
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-
> f532864.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Optimize FTS memory footprint

elirev
Hו yin
How do you determine the size being allocated for your _uid ?



--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimize FTS memory footprint

ybtsdst
Hi elirev,

The field "index" of class "org.apache.lucene.codecs.blocktree.FieldReader"
is the fst of each field; its type is FST<BytesRef>. I close a index and
pick a shard; wirte some code to directly read the shard and then use the
reflection to get the actual fst object of _uid field. The ramBytesUsed()
method returns memory cost of the fst.

2017-12-12 1:05 GMT+08:00 elirev <[hidden email]>:

> Hו yin
> How do you determine the size being allocated for your _uid ?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-
> f532864.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Optimize FTS memory footprint

Michael McCandless-2
In reply to this post by elirev
Comments below:

On Tue, Nov 28, 2017 at 4:47 PM, elirev <[hidden email]> wrote:

> Thanks   Mike .
> I did not  find  any  clear  way to know it its FST or Norm , or something
> else ( unless i miss something )  the fact the FST is an in memory prefix
> index lead me to think it using most of the heap   .
> Our  mapping is normal with around of 200 columns one of the columns is
> nested object with limited amount of objects (up to 4 instances  )   , we
> are using monthly base indexes  (keep 6 month open ) . In last month  i see
> dramatic extra  allocation on the segment memory (around 30% where in
> regulare    month is around 5%)  , the only change i see is that the nested
> object is now include avg 8 instances  )  , this increases the amount of
> the hidden document we have now on the  index (about more then twice) .
> When we optimize the index the amount of allocation memory was reduced (we
> see it only after rolling restart the nodes )   .
>
> If you don't mind  i have few question :
> 1) Do you know about an  way  to figure   out which component is taking all
> this memory .
>

How about a profiler?  E.g. YourKit works well for this in my experience.

You should also use Lucene's own Accountable API from the leaf readers --
this gives you a very detailed breakdown of Lucene's own accounting of
what's using RAM.


> 2) Do you see relation between the fact that the nested objects was
> increases to the extra memory allocation we have ?
>

Well, nested objects cause more documents to be created which may increase
heap needed to hold some data structures e.g. the live docs bitset.


> 3) Did FST memory usage is  impacted by the fact we optimize the
> problematic
> index  and why  we see it only after restarting ES service
>

Optimize also removes all pending deleted documents.  You should pull the
Accountables before and after the optimize to see where the changes really
were.

Mike McCandless

http://blog.mikemccandless.com
Reply | Threaded
Open this post in threaded view
|

Re: Optimize FTS memory footprint

Michael McCandless-2
In reply to this post by ybtsdst
Try upgrading Elasticsearch -- it's up to 6.0 release just a few week ago
now -- its (and Lucene's) memory usage has decreased over time.

The _uid field in particular will always be costly, unfortunately.  Since
it's a primary key, every term will be unique, and the term index has to
work hard to store all the prefixes for those keys.

Mike McCandless

http://blog.mikemccandless.com

On Sat, Dec 2, 2017 at 9:42 PM, Bingtao Yin <[hidden email]> wrote:

> Thanks, mike. I'm facing a similar problem.
> I'm running a 2.0 elasticsearch cluster, and find the fst of _uid field
> takes a lot of memory. The _uid field is not analyzed and generated by
> elasticsearch, which also has high cardinality.
> Is there any ways to  reduce memory cost for _uid field? Thanks.
>
>
> 2017-11-29 5:47 GMT+08:00 elirev <[hidden email]>:
>
> > Thanks   Mike .
> > I did not  find  any  clear  way to know it its FST or Norm , or
> something
> > else ( unless i miss something )  the fact the FST is an in memory prefix
> > index lead me to think it using most of the heap   .
> > Our  mapping is normal with around of 200 columns one of the columns is
> > nested object with limited amount of objects (up to 4 instances  )   , we
> > are using monthly base indexes  (keep 6 month open ) . In last month  i
> see
> > dramatic extra  allocation on the segment memory (around 30% where in
> > regulare    month is around 5%)  , the only change i see is that the
> nested
> > object is now include avg 8 instances  )  , this increases the amount of
> > the hidden document we have now on the  index (about more then twice) .
> > When we optimize the index the amount of allocation memory was reduced
> (we
> > see it only after rolling restart the nodes )   .
> >
> > If you don't mind  i have few question :
> > 1) Do you know about an  way  to figure   out which component is taking
> all
> > this memory .
> > 2) Do you see relation between the fact that the nested objects was
> > increases to the extra memory allocation we have ?
> > 3) Did FST memory usage is  impacted by the fact we optimize the
> > problematic
> > index  and why  we see it only after restarting ES service
> >
> > Thanks mike
> >
> > .
> >
> >
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-
> > f532864.html
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>