Recovery problem in solrcloud

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Recovery problem in solrcloud

cooljam
Hi
    I have  big index data files  more then 200g, there are two solr
instance in a shard.  leader startup and is ok, but the peer alway OOM
 when  it startup.  The peer alway download index files from leader because
of  recoveringAfterStartup property in RecoveryStrategy, total time taken
for download : 2350 secs.  if  data of the peer is empty, it is ok, but the
leader and the peer have a same generation number,  why the peer
do recovering?

thanks
cooljam
Reply | Threaded
Open this post in threaded view
|

RE: Recovery problem in solrcloud

Markus Jelsma-2
Perhaps this describes your problem:
https://issues.apache.org/jira/browse/SOLR-3685

 
 
-----Original message-----

> From:Jam Luo <[hidden email]>
> Sent: Tue 07-Aug-2012 11:52
> To: [hidden email]
> Subject: Recovery problem in solrcloud
>
> Hi
>     I have  big index data files  more then 200g, there are two solr
> instance in a shard.  leader startup and is ok, but the peer alway OOM
>  when  it startup.  The peer alway download index files from leader because
> of  recoveringAfterStartup property in RecoveryStrategy, total time taken
> for download : 2350 secs.  if  data of the peer is empty, it is ok, but the
> leader and the peer have a same generation number,  why the peer
> do recovering?
>
> thanks
> cooljam
>
Reply | Threaded
Open this post in threaded view
|

Re: Recovery problem in solrcloud

Mark Miller-3
In reply to this post by cooljam

On Aug 7, 2012, at 5:49 AM, Jam Luo <[hidden email]> wrote:

> Hi
>    I have  big index data files  more then 200g, there are two solr
> instance in a shard.  leader startup and is ok, but the peer alway OOM
> when  it startup.  

Can you share the OOM msg and stacktrace please?

> The peer alway download index files from leader because
> of  recoveringAfterStartup property in RecoveryStrategy, total time taken
> for download : 2350 secs.  if  data of the peer is empty, it is ok, but the
> leader and the peer have a same generation number,  why the peer
> do recovering?

We are looking into this.

>
> thanks
> cooljam

- Mark Miller
lucidimagination.com











Reply | Threaded
Open this post in threaded view
|

Re: Recovery problem in solrcloud

Mark Miller-3
Still no idea on the OOM - please send the stacktrace if you can.

As for doing a replication recovery when it should not be necessary, yonik just committed a fix for that a bit ago.

On Aug 7, 2012, at 9:41 AM, Mark Miller <[hidden email]> wrote:

>
> On Aug 7, 2012, at 5:49 AM, Jam Luo <[hidden email]> wrote:
>
>> Hi
>>   I have  big index data files  more then 200g, there are two solr
>> instance in a shard.  leader startup and is ok, but the peer alway OOM
>> when  it startup.  
>
> Can you share the OOM msg and stacktrace please?
>
>> The peer alway download index files from leader because
>> of  recoveringAfterStartup property in RecoveryStrategy, total time taken
>> for download : 2350 secs.  if  data of the peer is empty, it is ok, but the
>> leader and the peer have a same generation number,  why the peer
>> do recovering?
>
> We are looking into this.
>
>>
>> thanks
>> cooljam
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>

- Mark Miller
lucidimagination.com











Reply | Threaded
Open this post in threaded view
|

Re: Recovery problem in solrcloud

cooljam
Aug 06, 2012 10:05:55 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java
heap space
        at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
        at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
        at org.eclipse.jetty.server.Server.handle(Server.java:351)
        at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
        at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
        at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
        at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
        at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
        at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:54)
        at
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
        at
org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:129)
        at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
        at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
        at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
        at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1394)
        at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1269)
        at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:384)
        at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:420)
        at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1544)
        at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
        at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
        at org.eclipse.jetty.server.Server.handle(Server.java:351)
        at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
        at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
        at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)

    This error often appear  at the startup, no data write to the index,
but  it have a lot of query request. if I stop query more then  ten
minutes, the solr instance will start normally.
    My index data in solr data directory  is 200g+,  RAM is 16g, jvm
properties is
 -Xmx10g
 -Xss256k
 -Xmn512m
 -XX:+UseCompressedOops
    The OOM and the peer startup fail may be uncorrelated,  but this two
things often happen in the same solr instance and the same time.

    I can provide the full log file if you want.

thanks




2012/8/7 Mark Miller <[hidden email]>

> Still no idea on the OOM - please send the stacktrace if you can.
>
> As for doing a replication recovery when it should not be necessary, yonik
> just committed a fix for that a bit ago.
>
> On Aug 7, 2012, at 9:41 AM, Mark Miller <[hidden email]> wrote:
>
> >
> > On Aug 7, 2012, at 5:49 AM, Jam Luo <[hidden email]> wrote:
> >
> >> Hi
> >>   I have  big index data files  more then 200g, there are two solr
> >> instance in a shard.  leader startup and is ok, but the peer alway OOM
> >> when  it startup.
> >
> > Can you share the OOM msg and stacktrace please?
> >
> >> The peer alway download index files from leader because
> >> of  recoveringAfterStartup property in RecoveryStrategy, total time
> taken
> >> for download : 2350 secs.  if  data of the peer is empty, it is ok, but
> the
> >> leader and the peer have a same generation number,  why the peer
> >> do recovering?
> >
> > We are looking into this.
> >
> >>
> >> thanks
> >> cooljam
> >
> > - Mark Miller
> > lucidimagination.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Recovery problem in solrcloud

Yonik Seeley-2-2
Stack trace looks normal - it's just a multi-term query instantiating
a bitset.  The memory is being taken up somewhere else.
How many documents are in your index?
Can you get a heap dump or use some other memory profiler to see
what's taking up the space?

> if I stop query more then  ten minutes, the solr instance will start normally.

Maybe queries are piling up in threads before the server is ready to
handle them and then trying to handle them all at once gives an OOM?
Is this live traffic or a test?  How many concurrent requests get sent?

-Yonik
http://lucidimagination.com


On Wed, Aug 8, 2012 at 2:43 AM, Jam Luo <[hidden email]> wrote:

> Aug 06, 2012 10:05:55 AM org.apache.solr.common.SolrException log
> SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java
> heap space
>         at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
>         at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>         at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>         at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
>         at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
>         at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
>         at org.eclipse.jetty.server.Server.handle(Server.java:351)
>         at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
>         at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
>         at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
>         at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
>         at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
>         at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>         at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
>         at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>         at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:54)
>         at
> org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
>         at
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:129)
>         at
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
>         at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1394)
>         at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1269)
>         at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:384)
>         at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:420)
>         at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
>         at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1544)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
>         at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>         at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>         at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
>         at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
>         at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
>         at org.eclipse.jetty.server.Server.handle(Server.java:351)
>         at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
>         at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
>         at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
>
>     This error often appear  at the startup, no data write to the index,
> but  it have a lot of query request. if I stop query more then  ten
> minutes, the solr instance will start normally.
>     My index data in solr data directory  is 200g+,  RAM is 16g, jvm
> properties is
>  -Xmx10g
>  -Xss256k
>  -Xmn512m
>  -XX:+UseCompressedOops
>     The OOM and the peer startup fail may be uncorrelated,  but this two
> things often happen in the same solr instance and the same time.
>
>     I can provide the full log file if you want.
>
> thanks
>
>
>
>
> 2012/8/7 Mark Miller <[hidden email]>
>
>> Still no idea on the OOM - please send the stacktrace if you can.
>>
>> As for doing a replication recovery when it should not be necessary, yonik
>> just committed a fix for that a bit ago.
>>
>> On Aug 7, 2012, at 9:41 AM, Mark Miller <[hidden email]> wrote:
>>
>> >
>> > On Aug 7, 2012, at 5:49 AM, Jam Luo <[hidden email]> wrote:
>> >
>> >> Hi
>> >>   I have  big index data files  more then 200g, there are two solr
>> >> instance in a shard.  leader startup and is ok, but the peer alway OOM
>> >> when  it startup.
>> >
>> > Can you share the OOM msg and stacktrace please?
>> >
>> >> The peer alway download index files from leader because
>> >> of  recoveringAfterStartup property in RecoveryStrategy, total time
>> taken
>> >> for download : 2350 secs.  if  data of the peer is empty, it is ok, but
>> the
>> >> leader and the peer have a same generation number,  why the peer
>> >> do recovering?
>> >
>> > We are looking into this.
>> >
>> >>
>> >> thanks
>> >> cooljam
>> >
>> > - Mark Miller
>> > lucidimagination.com
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Recovery problem in solrcloud

cooljam
There are 400 million documents in a shard, a document is less then 1 kb.
the data file _**.fdt is 149g.
Does the recovering need large memory in downloading or after downloaded?

I find some log before OOM as below:
Aug 06, 2012 9:43:04 AM org.apache.solr.core.SolrCore execute
INFO: [blog] webapp=/solr path=/select
params={sort=createdAt+desc&distrib=false&collection=today,blog&hl.fl=content&wt=javabin&hl=false&rows=10&version=2&f.content.hl.fragsize=0&fl=id&shard.url=index35:8983/solr/blog/&NOW=1344217556702&start=0&q=((("somewordsA"+%26%26+"somewordsB"+%26%26+"somewordsC")+%26%26+platform:abc)+||+id:"/")+%26%26+(createdAt:[2012-07-30T01:43:28.462Z+TO+2012-08-06T01:43:28.462Z])&_system=business&isShard=true&fsv=true&f.title.hl.fragsize=0}
hits=0 status=0 QTime=95
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1

commit{dir=/home/ant/jetty/solr/data/index.20120801114027,segFN=segments_aui,generation=14058,filenames=[_cdnu_nrm.cfs,
_cdnu_0.frq, segments_aui, _cdnu.fdt, _cdnu_nrm.cfe, _cdnu_0.tim,
_cdnu.fdx, _cdnu.fnm, _cdnu_0.prx, _cdnu_0.tip, _cdnu.per]
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 14058
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@13578a09 main
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to
Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [blog] Registered new searcher
Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 06, 2012 9:43:05 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [blog] webapp=/solr path=/update
params={waitSearcher=true&commit_end_point=true&wt=javabin&commit=true&version=2}
{commit=} 0 1439
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@1a630c4d main
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to
Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [blog] Registered new searcher
Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 06, 2012 9:43:07 AM org.apache.solr.core.SolrCore execute
INFO: [blog] webapp=/solr path=/select
params={sort=createdAt+desc&distrib=false&collection=today,blog&hl.fl=content&wt=javabin&hl=false&rows=10&version=2&f.content.hl.fragsize=0&fl=id&shard.url=index35:8983/solr/blog/&NOW=1344217558778&start=0&_system=business&q=(((somewordsD)+%26%26+platform:(abc))+||+id:"/")+%26%26+(createdAt:[2012-07-30T01:43:30.537Z+TO+2012-08-06T01:43:30.537Z])&isShard=true&fsv=true&f.title.hl.fragsize=0}
hits=0 status=0 QTime=490

Except this log, all of other are "path=/select ******" in a few minutes,
there is no add documents request in this cluster in this time.Is
that related to the OOM?

This is live traffic, so I can't test it frequently, Tonight I add
-XX:+HeapDumpOnOutOfMemoryError
option, if this problem appear once again, I will get the  heap dump, but I
am not sure I can analyse it and get a result. I will ask for your help
please.

thanks

2012/8/8 Yonik Seeley <[hidden email]>

> Stack trace looks normal - it's just a multi-term query instantiating
> a bitset.  The memory is being taken up somewhere else.
> How many documents are in your index?
> Can you get a heap dump or use some other memory profiler to see
> what's taking up the space?
>
> > if I stop query more then  ten minutes, the solr instance will start
> normally.
>
> Maybe queries are piling up in threads before the server is ready to
> handle them and then trying to handle them all at once gives an OOM?
> Is this live traffic or a test?  How many concurrent requests get sent?
>
> -Yonik
> http://lucidimagination.com
>
>
> On Wed, Aug 8, 2012 at 2:43 AM, Jam Luo <[hidden email]> wrote:
> > Aug 06, 2012 10:05:55 AM org.apache.solr.common.SolrException log
> > SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java
> > heap space
> >         at
> >
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
> >         at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
> >         at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> >         at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> >         at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> >         at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
> >         at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> >         at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> >         at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> >         at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> >         at
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> >         at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> >         at org.eclipse.jetty.server.Server.handle(Server.java:351)
> >         at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> >         at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> >         at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
> >         at
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
> >         at
> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
> >         at
> > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> >         at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
> >         at
> >
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
> >         at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
> >         at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
> >         at java.lang.Thread.run(Thread.java:722)
> > Caused by: java.lang.OutOfMemoryError: Java heap space
> >         at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:54)
> >         at
> >
> org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
> >         at
> >
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:129)
> >         at
> >
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
> >         at
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
> >         at
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> >         at
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1394)
> >         at
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1269)
> >         at
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:384)
> >         at
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:420)
> >         at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
> >         at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1544)
> >         at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
> >         at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
> >         at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> >         at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> >         at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> >         at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
> >         at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> >         at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> >         at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> >         at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> >         at
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> >         at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> >         at org.eclipse.jetty.server.Server.handle(Server.java:351)
> >         at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> >         at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> >         at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
> >
> >     This error often appear  at the startup, no data write to the index,
> > but  it have a lot of query request. if I stop query more then  ten
> > minutes, the solr instance will start normally.
> >     My index data in solr data directory  is 200g+,  RAM is 16g, jvm
> > properties is
> >  -Xmx10g
> >  -Xss256k
> >  -Xmn512m
> >  -XX:+UseCompressedOops
> >     The OOM and the peer startup fail may be uncorrelated,  but this two
> > things often happen in the same solr instance and the same time.
> >
> >     I can provide the full log file if you want.
> >
> > thanks
> >
> >
> >
> >
> > 2012/8/7 Mark Miller <[hidden email]>
> >
> >> Still no idea on the OOM - please send the stacktrace if you can.
> >>
> >> As for doing a replication recovery when it should not be necessary,
> yonik
> >> just committed a fix for that a bit ago.
> >>
> >> On Aug 7, 2012, at 9:41 AM, Mark Miller <[hidden email]> wrote:
> >>
> >> >
> >> > On Aug 7, 2012, at 5:49 AM, Jam Luo <[hidden email]> wrote:
> >> >
> >> >> Hi
> >> >>   I have  big index data files  more then 200g, there are two solr
> >> >> instance in a shard.  leader startup and is ok, but the peer alway
> OOM
> >> >> when  it startup.
> >> >
> >> > Can you share the OOM msg and stacktrace please?
> >> >
> >> >> The peer alway download index files from leader because
> >> >> of  recoveringAfterStartup property in RecoveryStrategy, total time
> >> taken
> >> >> for download : 2350 secs.  if  data of the peer is empty, it is ok,
> but
> >> the
> >> >> leader and the peer have a same generation number,  why the peer
> >> >> do recovering?
> >> >
> >> > We are looking into this.
> >> >
> >> >>
> >> >> thanks
> >> >> cooljam
> >> >
> >> > - Mark Miller
> >> > lucidimagination.com
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >> - Mark Miller
> >> lucidimagination.com
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>