How to Manage RAM Usage at Heavy Indexing

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

How to Manage RAM Usage at Heavy Indexing

kamaci
I make a test at my SolrCloud. I try to send 100 millions documents into my
node which has no replica via Hadoop. When document count send to that node
is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap Usage
is not 99%, it uses just 3GB - 4GB of RAM). After a time later my node
stops to receiving documents to index and the Indexer Job fails as well.

How can I force to clean OS cache (if it is OS cache that blocks) me or
what should I do (maybe sending 10 million documents and waiting a little
etc.) What fellows do at heavy indexing situations?
Reply | Threaded
Open this post in threaded view
|

Re: How to Manage RAM Usage at Heavy Indexing

Erick Erickson
This is sounding like an XY problem. What are you measuring
when you say RAM usage is 99%? is this virtual memory? See:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

What errors are you seeing when you say: "my node stops to receiving
documents"?

How are you sending 10M documents? All at once in a huge packet
or some smaller number at a time? From where? How?

And what does Hadoop have to do with anything? Are you putting
the Solr index on Hadoop? How? The recent contrib?

In short, you haven't provided very many details. You've been around
long enough that I'm surprised you're saying "it doesn't work, how can
I fix it?" without providing much in the way of details to help us help
you.

Best
Erick



On Sat, Aug 24, 2013 at 1:52 PM, Furkan KAMACI <[hidden email]>wrote:

> I make a test at my SolrCloud. I try to send 100 millions documents into my
> node which has no replica via Hadoop. When document count send to that node
> is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap Usage
> is not 99%, it uses just 3GB - 4GB of RAM). After a time later my node
> stops to receiving documents to index and the Indexer Job fails as well.
>
> How can I force to clean OS cache (if it is OS cache that blocks) me or
> what should I do (maybe sending 10 million documents and waiting a little
> etc.) What fellows do at heavy indexing situations?
>
Reply | Threaded
Open this post in threaded view
|

Re: How to Manage RAM Usage at Heavy Indexing

kamaci
Hi Erick;

I wanted to get a quick answer that's why I asked my question as that way.

Error is as follows:

INFO  - 2013-08-21 22:01:30.978;
org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
webapp=/solr path=/update params={wt=javabin&version=2}
{add=[com.deviantart.reachmeh
ere:http/gallery/, com.deviantart.reachstereo:http/,
com.deviantart.reachstereo:http/art/SE-mods-313298903,
com.deviantart.reachtheclouds:http/, com.deviantart.reachthegoddess:http/,
co
m.deviantart.reachthegoddess:http/art/retouched-160219962,
com.deviantart.reachthegoddess:http/badges/,
com.deviantart.reachthegoddess:http/favourites/,
com.deviantart.reachthetop:http/
art/Blue-Jean-Baby-82204657 (1444006227844530177),
com.deviantart.reachurdreams:http/, ... (163 adds)]} 0 38790
ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
early EOF
at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.eclipse.jetty.io.EofException: early EOF
at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
at java.io.InputStream.read(InputStream.java:101)
at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at
com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
... 36 more

ERROR - 2013-08-21 22:01:30.980; org.apache.solr.common.SolrException;
null:java.lang.RuntimeException: [was class
org.eclipse.jetty.io.EofException] early EOF
at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)


I use Nutch (that uses Hadoop) to send documents from Hbase to Solr. I am
not indexing documents at Hadoop. I just send documents via Map/Reduce jobs
into my SolrCloud. Nutch sends documents as like that:

...
SolrServer solr = new CommonsHttpSolrServer(solrUrl);
...
 private final List<SolrInputDocument> inputDocs =  new
ArrayList<SolrInputDocument>();
...
solr.add(inputDocs);
...

inputDocs holds maximum of 1000 documents. After I add inputdocs into Solr
Server I truncate inputdocs list. Then I add new 1000 documents into that
list until every documents send to SolrCloud. When all documents send to
SolrCloud I call commit command.

My Hadoop job could not send documents into SolrCloud and stops to send
documents into Solr (Hadoop job fails) When I open my Solr Adming Page I
see that:

Physical Memory  98.1%
Swap Space NaN%
File Descriptor Count 2.5%
JVM-Memory 1.6%

All in all I think that problem is Physical Memory. I stopped indexing and
Physical Memory is usage is still same (it does not goes down). My machine
uses CentOS 6.4. Should I drop caches when percentage goes up or what do
you do for such kind of situations?



2013/8/24 Erick Erickson <[hidden email]>

> This is sounding like an XY problem. What are you measuring
> when you say RAM usage is 99%? is this virtual memory? See:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> What errors are you seeing when you say: "my node stops to receiving
> documents"?
>
> How are you sending 10M documents? All at once in a huge packet
> or some smaller number at a time? From where? How?
>
> And what does Hadoop have to do with anything? Are you putting
> the Solr index on Hadoop? How? The recent contrib?
>
> In short, you haven't provided very many details. You've been around
> long enough that I'm surprised you're saying "it doesn't work, how can
> I fix it?" without providing much in the way of details to help us help
> you.
>
> Best
> Erick
>
>
>
> On Sat, Aug 24, 2013 at 1:52 PM, Furkan KAMACI <[hidden email]
> >wrote:
>
> > I make a test at my SolrCloud. I try to send 100 millions documents into
> my
> > node which has no replica via Hadoop. When document count send to that
> node
> > is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap
> Usage
> > is not 99%, it uses just 3GB - 4GB of RAM). After a time later my node
> > stops to receiving documents to index and the Indexer Job fails as well.
> >
> > How can I force to clean OS cache (if it is OS cache that blocks) me or
> > what should I do (maybe sending 10 million documents and waiting a little
> > etc.) What fellows do at heavy indexing situations?
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to Manage RAM Usage at Heavy Indexing

Dan Davis-2
This could be an operating systems problem rather than a Solr problem.
CentOS 6.4 (linux kernel 2.6.32) may have some issues with page flushing
and I would read-up up on that.
The VM parameters can be tuned in /etc/sysctl.conf


On Sun, Aug 25, 2013 at 4:23 PM, Furkan KAMACI <[hidden email]>wrote:

> Hi Erick;
>
> I wanted to get a quick answer that's why I asked my question as that way.
>
> Error is as follows:
>
> INFO  - 2013-08-21 22:01:30.978;
> org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
> webapp=/solr path=/update params={wt=javabin&version=2}
> {add=[com.deviantart.reachmeh
> ere:http/gallery/, com.deviantart.reachstereo:http/,
> com.deviantart.reachstereo:http/art/SE-mods-313298903,
> com.deviantart.reachtheclouds:http/, com.deviantart.reachthegoddess:http/,
> co
> m.deviantart.reachthegoddess:http/art/retouched-160219962,
> com.deviantart.reachthegoddess:http/badges/,
> com.deviantart.reachthegoddess:http/favourites/,
> com.deviantart.reachthetop:http/
> art/Blue-Jean-Baby-82204657 (1444006227844530177),
> com.deviantart.reachurdreams:http/, ... (163 adds)]} 0 38790
> ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
> java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
> early EOF
> at
>
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> at
>
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
> at
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> at
>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:365)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
>
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.eclipse.jetty.io.EofException: early EOF
> at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
> at java.io.InputStream.read(InputStream.java:101)
> at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
> at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
> at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
> at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
> at
>
> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
> at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
> at
>
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
> at
>
> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
> at
> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
> at
>
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
> ... 36 more
>
> ERROR - 2013-08-21 22:01:30.980; org.apache.solr.common.SolrException;
> null:java.lang.RuntimeException: [was class
> org.eclipse.jetty.io.EofException] early EOF
> at
>
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> at
>
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
> at
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> at
>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>
>
> I use Nutch (that uses Hadoop) to send documents from Hbase to Solr. I am
> not indexing documents at Hadoop. I just send documents via Map/Reduce jobs
> into my SolrCloud. Nutch sends documents as like that:
>
> ...
> SolrServer solr = new CommonsHttpSolrServer(solrUrl);
> ...
>  private final List<SolrInputDocument> inputDocs =  new
> ArrayList<SolrInputDocument>();
> ...
> solr.add(inputDocs);
> ...
>
> inputDocs holds maximum of 1000 documents. After I add inputdocs into Solr
> Server I truncate inputdocs list. Then I add new 1000 documents into that
> list until every documents send to SolrCloud. When all documents send to
> SolrCloud I call commit command.
>
> My Hadoop job could not send documents into SolrCloud and stops to send
> documents into Solr (Hadoop job fails) When I open my Solr Adming Page I
> see that:
>
> Physical Memory  98.1%
> Swap Space NaN%
> File Descriptor Count 2.5%
> JVM-Memory 1.6%
>
> All in all I think that problem is Physical Memory. I stopped indexing and
> Physical Memory is usage is still same (it does not goes down). My machine
> uses CentOS 6.4. Should I drop caches when percentage goes up or what do
> you do for such kind of situations?
>
>
>
> 2013/8/24 Erick Erickson <[hidden email]>
>
> > This is sounding like an XY problem. What are you measuring
> > when you say RAM usage is 99%? is this virtual memory? See:
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >
> > What errors are you seeing when you say: "my node stops to receiving
> > documents"?
> >
> > How are you sending 10M documents? All at once in a huge packet
> > or some smaller number at a time? From where? How?
> >
> > And what does Hadoop have to do with anything? Are you putting
> > the Solr index on Hadoop? How? The recent contrib?
> >
> > In short, you haven't provided very many details. You've been around
> > long enough that I'm surprised you're saying "it doesn't work, how can
> > I fix it?" without providing much in the way of details to help us help
> > you.
> >
> > Best
> > Erick
> >
> >
> >
> > On Sat, Aug 24, 2013 at 1:52 PM, Furkan KAMACI <[hidden email]
> > >wrote:
> >
> > > I make a test at my SolrCloud. I try to send 100 millions documents
> into
> > my
> > > node which has no replica via Hadoop. When document count send to that
> > node
> > > is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap
> > Usage
> > > is not 99%, it uses just 3GB - 4GB of RAM). After a time later my node
> > > stops to receiving documents to index and the Indexer Job fails as
> well.
> > >
> > > How can I force to clean OS cache (if it is OS cache that blocks) me or
> > > what should I do (maybe sending 10 million documents and waiting a
> little
> > > etc.) What fellows do at heavy indexing situations?
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to Manage RAM Usage at Heavy Indexing

kamaci
Is there anything says something about that bug?


2013/8/28 Dan Davis <[hidden email]>

> This could be an operating systems problem rather than a Solr problem.
> CentOS 6.4 (linux kernel 2.6.32) may have some issues with page flushing
> and I would read-up up on that.
> The VM parameters can be tuned in /etc/sysctl.conf
>
>
> On Sun, Aug 25, 2013 at 4:23 PM, Furkan KAMACI <[hidden email]
> >wrote:
>
> > Hi Erick;
> >
> > I wanted to get a quick answer that's why I asked my question as that
> way.
> >
> > Error is as follows:
> >
> > INFO  - 2013-08-21 22:01:30.978;
> > org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
> > webapp=/solr path=/update params={wt=javabin&version=2}
> > {add=[com.deviantart.reachmeh
> > ere:http/gallery/, com.deviantart.reachstereo:http/,
> > com.deviantart.reachstereo:http/art/SE-mods-313298903,
> > com.deviantart.reachtheclouds:http/,
> com.deviantart.reachthegoddess:http/,
> > co
> > m.deviantart.reachthegoddess:http/art/retouched-160219962,
> > com.deviantart.reachthegoddess:http/badges/,
> > com.deviantart.reachthegoddess:http/favourites/,
> > com.deviantart.reachthetop:http/
> > art/Blue-Jean-Baby-82204657 (1444006227844530177),
> > com.deviantart.reachurdreams:http/, ... (163 adds)]} 0 38790
> > ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
> > java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
> > early EOF
> > at
> >
> >
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> > at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> > at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
> > at
> >
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
> > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> > at
> >
> >
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> > at
> >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> > at
> >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> > at
> >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> > at
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> > at org.eclipse.jetty.server.Server.handle(Server.java:365)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
> > at
> >
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> > at
> >
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> > at
> >
> >
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> > at
> >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> > at
> >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> > at java.lang.Thread.run(Thread.java:722)
> > Caused by: org.eclipse.jetty.io.EofException: early EOF
> > at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
> > at java.io.InputStream.read(InputStream.java:101)
> > at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
> > at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
> > at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
> > at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
> > at
> >
> >
> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
> > at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
> > at
> >
> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
> > ... 36 more
> >
> > ERROR - 2013-08-21 22:01:30.980; org.apache.solr.common.SolrException;
> > null:java.lang.RuntimeException: [was class
> > org.eclipse.jetty.io.EofException] early EOF
> > at
> >
> >
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> > at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> > at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
> > at
> >
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
> > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> > at
> >
> >
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> > at
> >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> >
> >
> > I use Nutch (that uses Hadoop) to send documents from Hbase to Solr. I am
> > not indexing documents at Hadoop. I just send documents via Map/Reduce
> jobs
> > into my SolrCloud. Nutch sends documents as like that:
> >
> > ...
> > SolrServer solr = new CommonsHttpSolrServer(solrUrl);
> > ...
> >  private final List<SolrInputDocument> inputDocs =  new
> > ArrayList<SolrInputDocument>();
> > ...
> > solr.add(inputDocs);
> > ...
> >
> > inputDocs holds maximum of 1000 documents. After I add inputdocs into
> Solr
> > Server I truncate inputdocs list. Then I add new 1000 documents into that
> > list until every documents send to SolrCloud. When all documents send to
> > SolrCloud I call commit command.
> >
> > My Hadoop job could not send documents into SolrCloud and stops to send
> > documents into Solr (Hadoop job fails) When I open my Solr Adming Page I
> > see that:
> >
> > Physical Memory  98.1%
> > Swap Space NaN%
> > File Descriptor Count 2.5%
> > JVM-Memory 1.6%
> >
> > All in all I think that problem is Physical Memory. I stopped indexing
> and
> > Physical Memory is usage is still same (it does not goes down). My
> machine
> > uses CentOS 6.4. Should I drop caches when percentage goes up or what do
> > you do for such kind of situations?
> >
> >
> >
> > 2013/8/24 Erick Erickson <[hidden email]>
> >
> > > This is sounding like an XY problem. What are you measuring
> > > when you say RAM usage is 99%? is this virtual memory? See:
> > >
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> > >
> > > What errors are you seeing when you say: "my node stops to receiving
> > > documents"?
> > >
> > > How are you sending 10M documents? All at once in a huge packet
> > > or some smaller number at a time? From where? How?
> > >
> > > And what does Hadoop have to do with anything? Are you putting
> > > the Solr index on Hadoop? How? The recent contrib?
> > >
> > > In short, you haven't provided very many details. You've been around
> > > long enough that I'm surprised you're saying "it doesn't work, how can
> > > I fix it?" without providing much in the way of details to help us help
> > > you.
> > >
> > > Best
> > > Erick
> > >
> > >
> > >
> > > On Sat, Aug 24, 2013 at 1:52 PM, Furkan KAMACI <[hidden email]
> > > >wrote:
> > >
> > > > I make a test at my SolrCloud. I try to send 100 millions documents
> > into
> > > my
> > > > node which has no replica via Hadoop. When document count send to
> that
> > > node
> > > > is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap
> > > Usage
> > > > is not 99%, it uses just 3GB - 4GB of RAM). After a time later my
> node
> > > > stops to receiving documents to index and the Indexer Job fails as
> > well.
> > > >
> > > > How can I force to clean OS cache (if it is OS cache that blocks) me
> or
> > > > what should I do (maybe sending 10 million documents and waiting a
> > little
> > > > etc.) What fellows do at heavy indexing situations?
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to Manage RAM Usage at Heavy Indexing

P Williams
In reply to this post by Dan Davis-2
Hi,

I've been seeing the same thing on CentOS with high physical memory use
with low JVM-Memory use.  I came to the conclusion that this was expected
behaviour.  Using top I noticed that my solr user's java process has
Virtual memory allocated of about twice the size of the index, actual is
within the limits I set when jetty starts.  I infer from this that 98% of
Physical Memory is being used to cache the index.  Walter, Erick and others
are constantly reminding people on list to have RAM the size of the index
available -- I think 98% physical memory use is exactly why.  Here is an
excerpt from Uwe Schindler's well written
piece<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html>which
explains in greater detail:

*"Basically mmap does the same like handling the Lucene index as a swap
file. The mmap() syscall tells the O/S kernel to virtually map our whole
index files into the previously described virtual address space, and make
them look like RAM available to our Lucene process. We can then access our
index file on disk just like it would be a large byte[] array (in Java this
is encapsulated by a ByteBuffer interface to make it safe for use by Java
code). If we access this virtual address space from the Lucene code we
don’t need to do any syscalls, the processor’s MMU and TLB handles all the
mapping for us. If the data is only on disk, the MMU will cause an
interrupt and the O/S kernel will load the data into file system cache. If
it is already in cache, MMU/TLB map it directly to the physical memory in
file system cache. It is now just a native memory access, nothing more! We
don’t have to take care of paging in/out of buffers, all this is managed by
the O/S kernel. Furthermore, we have no concurrency issue, the only
overhead over a standard byte[] array is some wrapping caused by
Java’s ByteBuffer
interface (it is still slower than a real byte[] array, but that is the
only way to use mmap from Java and is much faster than all other directory
implementations shipped with Lucene). We also waste no physical memory, as
we operate directly on the O/S cache, avoiding all Java GC issues described
before."*
*
*
Is it odd that my index is ~16GB but top shows 30GB in virtual memory?
 Would the extra be for the field and filter caches I've increased in size?

I went through a few Java tuning steps relating to OutOfMemoryErrors when
using DataImportHandler with Solr.  The first thing is that when using the
FileEntityProcessor for each file in the file system to be indexed an entry
is made and stored in heap before any indexing actually occurs.  When I
started pointing this at very large directories I started running out of
heap.  One work-around is to divide the job up into smaller batches, but I
was able to allocate more memory so that everything fit.  The next thing is
that with more memory allocated the limiting factor was too many open
files.  After allowing the solr user to open more files I was able to get
past this as well.  There was a sweet spot where indexing with just enough
memory was slow enough that I didn't experience the too many open files
error but why go slow?  Now I'm able to index ~4M documents (newspaper
articles and fulltext monographs) in about 7 hours.

I hope someone will correct me if I'm wrong about anything I've said here
and especially if there is a better way to do things.

Best of luck,
Tricia



On Wed, Aug 28, 2013 at 12:12 PM, Dan Davis <[hidden email]> wrote:

> This could be an operating systems problem rather than a Solr problem.
> CentOS 6.4 (linux kernel 2.6.32) may have some issues with page flushing
> and I would read-up up on that.
> The VM parameters can be tuned in /etc/sysctl.conf
>
>
> On Sun, Aug 25, 2013 at 4:23 PM, Furkan KAMACI <[hidden email]
> >wrote:
>
> > Hi Erick;
> >
> > I wanted to get a quick answer that's why I asked my question as that
> way.
> >
> > Error is as follows:
> >
> > INFO  - 2013-08-21 22:01:30.978;
> > org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
> > webapp=/solr path=/update params={wt=javabin&version=2}
> > {add=[com.deviantart.reachmeh
> > ere:http/gallery/, com.deviantart.reachstereo:http/,
> > com.deviantart.reachstereo:http/art/SE-mods-313298903,
> > com.deviantart.reachtheclouds:http/,
> com.deviantart.reachthegoddess:http/,
> > co
> > m.deviantart.reachthegoddess:http/art/retouched-160219962,
> > com.deviantart.reachthegoddess:http/badges/,
> > com.deviantart.reachthegoddess:http/favourites/,
> > com.deviantart.reachthetop:http/
> > art/Blue-Jean-Baby-82204657 (1444006227844530177),
> > com.deviantart.reachurdreams:http/, ... (163 adds)]} 0 38790
> > ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
> > java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
> > early EOF
> > at
> >
> >
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> > at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> > at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
> > at
> >
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
> > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> > at
> >
> >
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> > at
> >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> > at
> >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> > at
> >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> > at
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> > at org.eclipse.jetty.server.Server.handle(Server.java:365)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
> > at
> >
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> > at
> >
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> > at
> >
> >
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> > at
> >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> > at
> >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> > at java.lang.Thread.run(Thread.java:722)
> > Caused by: org.eclipse.jetty.io.EofException: early EOF
> > at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
> > at java.io.InputStream.read(InputStream.java:101)
> > at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
> > at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
> > at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
> > at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
> > at
> >
> >
> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
> > at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
> > at
> >
> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
> > ... 36 more
> >
> > ERROR - 2013-08-21 22:01:30.980; org.apache.solr.common.SolrException;
> > null:java.lang.RuntimeException: [was class
> > org.eclipse.jetty.io.EofException] early EOF
> > at
> >
> >
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> > at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> > at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
> > at
> >
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
> > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> > at
> >
> >
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> > at
> >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> >
> >
> > I use Nutch (that uses Hadoop) to send documents from Hbase to Solr. I am
> > not indexing documents at Hadoop. I just send documents via Map/Reduce
> jobs
> > into my SolrCloud. Nutch sends documents as like that:
> >
> > ...
> > SolrServer solr = new CommonsHttpSolrServer(solrUrl);
> > ...
> >  private final List<SolrInputDocument> inputDocs =  new
> > ArrayList<SolrInputDocument>();
> > ...
> > solr.add(inputDocs);
> > ...
> >
> > inputDocs holds maximum of 1000 documents. After I add inputdocs into
> Solr
> > Server I truncate inputdocs list. Then I add new 1000 documents into that
> > list until every documents send to SolrCloud. When all documents send to
> > SolrCloud I call commit command.
> >
> > My Hadoop job could not send documents into SolrCloud and stops to send
> > documents into Solr (Hadoop job fails) When I open my Solr Adming Page I
> > see that:
> >
> > Physical Memory  98.1%
> > Swap Space NaN%
> > File Descriptor Count 2.5%
> > JVM-Memory 1.6%
> >
> > All in all I think that problem is Physical Memory. I stopped indexing
> and
> > Physical Memory is usage is still same (it does not goes down). My
> machine
> > uses CentOS 6.4. Should I drop caches when percentage goes up or what do
> > you do for such kind of situations?
> >
> >
> >
> > 2013/8/24 Erick Erickson <[hidden email]>
> >
> > > This is sounding like an XY problem. What are you measuring
> > > when you say RAM usage is 99%? is this virtual memory? See:
> > >
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> > >
> > > What errors are you seeing when you say: "my node stops to receiving
> > > documents"?
> > >
> > > How are you sending 10M documents? All at once in a huge packet
> > > or some smaller number at a time? From where? How?
> > >
> > > And what does Hadoop have to do with anything? Are you putting
> > > the Solr index on Hadoop? How? The recent contrib?
> > >
> > > In short, you haven't provided very many details. You've been around
> > > long enough that I'm surprised you're saying "it doesn't work, how can
> > > I fix it?" without providing much in the way of details to help us help
> > > you.
> > >
> > > Best
> > > Erick
> > >
> > >
> > >
> > > On Sat, Aug 24, 2013 at 1:52 PM, Furkan KAMACI <[hidden email]
> > > >wrote:
> > >
> > > > I make a test at my SolrCloud. I try to send 100 millions documents
> > into
> > > my
> > > > node which has no replica via Hadoop. When document count send to
> that
> > > node
> > > > is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap
> > > Usage
> > > > is not 99%, it uses just 3GB - 4GB of RAM). After a time later my
> node
> > > > stops to receiving documents to index and the Indexer Job fails as
> > well.
> > > >
> > > > How can I force to clean OS cache (if it is OS cache that blocks) me
> or
> > > > what should I do (maybe sending 10 million documents and waiting a
> > little
> > > > etc.) What fellows do at heavy indexing situations?
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to Manage RAM Usage at Heavy Indexing

Shawn Heisey-4
On 9/9/2013 10:35 AM, P Williams wrote:
> Is it odd that my index is ~16GB but top shows 30GB in virtual memory?
>   Would the extra be for the field and filter caches I've increased in size?

This should probably be a new thread, but it might have some
applicability here, so I'm replying.

I have noticed some inconsistencies in memory reporting on Linux with
regard to Solr.  Here's a screenshot of top on one of my production
systems, sorted by memory:

https://www.dropbox.com/s/ylxm0qlcegithzc/prod-top-sort-mem.png

The virtual memory size for the top process is right in line with my
index size, plus a few gig for the java heap.  Something to note as you
ponder these numbers: My java heap is only 6GB.  Java has allocated the
entire 6GB.  The other two java processes are homegrown Solr-related
applications.

What's odd is the resident and shared memory sizes.  I have pretty much
convinced myself that the shared memory size is misreported.  If you add
up the numbers for cached and free, you get a total of 53659264 ...
about 11GB shy of the 64GB total memory.

if the reported resident memory for the Solr java process (17GB) were
accurate, this would exceed total physical memory by several gigabytes,
and there would be swap in use, but as you can see, there is no swap in use.

Recently I overheard a conversation between Lucene committers in a
lucene IRC channel that seemed to be discussing this phenomenon.  There
is apparently some issue with certain mmap modes that result in the
operating system shared memory number going up even though no actual
memory is being consumed.

Thanks,
Shawn