How to determine why solr stops running?

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

How to determine why solr stops running?

Ryan W
Hi all,

I manage a site where solr has stopped running a couple times in the past
week. The server hasn't been rebooted, so that's not the reason.  What else
causes solr to stop running?  How can I investigate why this is happening?

Thank you,
Ryan
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

James Greene
Check the log for for an OOM crash.  Fatal exceptions will be in the main
solr log and out of memory errors will be in their own -oom log.

I've encountered quite a few solr crashes and usually it's when there's a
threshold of concurrent users and/or indexing happening.



On Thu, May 14, 2020, 9:23 AM Ryan W <[hidden email]> wrote:

> Hi all,
>
> I manage a site where solr has stopped running a couple times in the past
> week. The server hasn't been rebooted, so that's not the reason.  What else
> causes solr to stop running?  How can I investigate why this is happening?
>
> Thank you,
> Ryan
>
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Ryan W
I don't see any log file with "oom" in the file name.  Does that mean there
hasn't been an out-of-memory issue?  Thanks.

On Thu, May 14, 2020 at 10:05 AM James Greene <[hidden email]>
wrote:

> Check the log for for an OOM crash.  Fatal exceptions will be in the main
> solr log and out of memory errors will be in their own -oom log.
>
> I've encountered quite a few solr crashes and usually it's when there's a
> threshold of concurrent users and/or indexing happening.
>
>
>
> On Thu, May 14, 2020, 9:23 AM Ryan W <[hidden email]> wrote:
>
> > Hi all,
> >
> > I manage a site where solr has stopped running a couple times in the past
> > week. The server hasn't been rebooted, so that's not the reason.  What
> else
> > causes solr to stop running?  How can I investigate why this is
> happening?
> >
> > Thank you,
> > Ryan
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Erick Erickson
Probably, but check that you are running with the oom-killer, it'll be in
your start params.

But absent that, something external will be the culprit, Solr doesn't stop
by itself. Do look at the Solr log once things stop, it should show if
someone or something stopped it.

On Mon, May 18, 2020, 10:43 Ryan W <[hidden email]> wrote:

> I don't see any log file with "oom" in the file name.  Does that mean there
> hasn't been an out-of-memory issue?  Thanks.
>
> On Thu, May 14, 2020 at 10:05 AM James Greene <[hidden email]
> >
> wrote:
>
> > Check the log for for an OOM crash.  Fatal exceptions will be in the main
> > solr log and out of memory errors will be in their own -oom log.
> >
> > I've encountered quite a few solr crashes and usually it's when there's a
> > threshold of concurrent users and/or indexing happening.
> >
> >
> >
> > On Thu, May 14, 2020, 9:23 AM Ryan W <[hidden email]> wrote:
> >
> > > Hi all,
> > >
> > > I manage a site where solr has stopped running a couple times in the
> past
> > > week. The server hasn't been rebooted, so that's not the reason.  What
> > else
> > > causes solr to stop running?  How can I investigate why this is
> > happening?
> > >
> > > Thank you,
> > > Ryan
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Ryan W
Is there a config file containing the start params?  I run solr like...

bin/solr start

I have not seen anything in the logs that seems informative. When I grep in
the logs directory for 'memory', I see nothing besides a couple entries
like...

2020-05-14 13:05:56.155 INFO  (main) [   ] o.a.s.h.a.MetricsHistoryHandler
No .system collection, keeping metrics history in memory.

I don't know what that entry means, though the date does roughly coincide
with the last time solr stopped running.

Thank you.


On Mon, May 18, 2020 at 12:00 PM Erick Erickson <[hidden email]>
wrote:

> Probably, but check that you are running with the oom-killer, it'll be in
> your start params.
>
> But absent that, something external will be the culprit, Solr doesn't stop
> by itself. Do look at the Solr log once things stop, it should show if
> someone or something stopped it.
>
> On Mon, May 18, 2020, 10:43 Ryan W <[hidden email]> wrote:
>
> > I don't see any log file with "oom" in the file name.  Does that mean
> there
> > hasn't been an out-of-memory issue?  Thanks.
> >
> > On Thu, May 14, 2020 at 10:05 AM James Greene <
> [hidden email]
> > >
> > wrote:
> >
> > > Check the log for for an OOM crash.  Fatal exceptions will be in the
> main
> > > solr log and out of memory errors will be in their own -oom log.
> > >
> > > I've encountered quite a few solr crashes and usually it's when
> there's a
> > > threshold of concurrent users and/or indexing happening.
> > >
> > >
> > >
> > > On Thu, May 14, 2020, 9:23 AM Ryan W <[hidden email]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I manage a site where solr has stopped running a couple times in the
> > past
> > > > week. The server hasn't been rebooted, so that's not the reason.
> What
> > > else
> > > > causes solr to stop running?  How can I investigate why this is
> > > happening?
> > > >
> > > > Thank you,
> > > > Ryan
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Erick Erickson
ps aux | grep solr

on a *.nix system will show you all the runtime parameters.

> On May 18, 2020, at 12:46 PM, Ryan W <[hidden email]> wrote:
>
> Is there a config file containing the start params?  I run solr like...
>
> bin/solr start
>
> I have not seen anything in the logs that seems informative. When I grep in
> the logs directory for 'memory', I see nothing besides a couple entries
> like...
>
> 2020-05-14 13:05:56.155 INFO  (main) [   ] o.a.s.h.a.MetricsHistoryHandler
> No .system collection, keeping metrics history in memory.
>
> I don't know what that entry means, though the date does roughly coincide
> with the last time solr stopped running.
>
> Thank you.
>
>
> On Mon, May 18, 2020 at 12:00 PM Erick Erickson <[hidden email]>
> wrote:
>
>> Probably, but check that you are running with the oom-killer, it'll be in
>> your start params.
>>
>> But absent that, something external will be the culprit, Solr doesn't stop
>> by itself. Do look at the Solr log once things stop, it should show if
>> someone or something stopped it.
>>
>> On Mon, May 18, 2020, 10:43 Ryan W <[hidden email]> wrote:
>>
>>> I don't see any log file with "oom" in the file name.  Does that mean
>> there
>>> hasn't been an out-of-memory issue?  Thanks.
>>>
>>> On Thu, May 14, 2020 at 10:05 AM James Greene <
>> [hidden email]
>>>>
>>> wrote:
>>>
>>>> Check the log for for an OOM crash.  Fatal exceptions will be in the
>> main
>>>> solr log and out of memory errors will be in their own -oom log.
>>>>
>>>> I've encountered quite a few solr crashes and usually it's when
>> there's a
>>>> threshold of concurrent users and/or indexing happening.
>>>>
>>>>
>>>>
>>>> On Thu, May 14, 2020, 9:23 AM Ryan W <[hidden email]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I manage a site where solr has stopped running a couple times in the
>>> past
>>>>> week. The server hasn't been rebooted, so that's not the reason.
>> What
>>>> else
>>>>> causes solr to stop running?  How can I investigate why this is
>>>> happening?
>>>>>
>>>>> Thank you,
>>>>> Ryan
>>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

James Greene
I usually do a combination of grepping for ERROR in solr logs and checking
journalctl to see if an external program may have killed the process.


Cheers,

/************************************
*       James Austin Greene
*  www.jamesaustingreene.com
*              336-lol-nerd
************************************/


On Mon, May 18, 2020 at 1:39 PM Erick Erickson <[hidden email]>
wrote:

> ps aux | grep solr
>
> on a *.nix system will show you all the runtime parameters.
>
> > On May 18, 2020, at 12:46 PM, Ryan W <[hidden email]> wrote:
> >
> > Is there a config file containing the start params?  I run solr like...
> >
> > bin/solr start
> >
> > I have not seen anything in the logs that seems informative. When I grep
> in
> > the logs directory for 'memory', I see nothing besides a couple entries
> > like...
> >
> > 2020-05-14 13:05:56.155 INFO  (main) [   ]
> o.a.s.h.a.MetricsHistoryHandler
> > No .system collection, keeping metrics history in memory.
> >
> > I don't know what that entry means, though the date does roughly coincide
> > with the last time solr stopped running.
> >
> > Thank you.
> >
> >
> > On Mon, May 18, 2020 at 12:00 PM Erick Erickson <[hidden email]
> >
> > wrote:
> >
> >> Probably, but check that you are running with the oom-killer, it'll be
> in
> >> your start params.
> >>
> >> But absent that, something external will be the culprit, Solr doesn't
> stop
> >> by itself. Do look at the Solr log once things stop, it should show if
> >> someone or something stopped it.
> >>
> >> On Mon, May 18, 2020, 10:43 Ryan W <[hidden email]> wrote:
> >>
> >>> I don't see any log file with "oom" in the file name.  Does that mean
> >> there
> >>> hasn't been an out-of-memory issue?  Thanks.
> >>>
> >>> On Thu, May 14, 2020 at 10:05 AM James Greene <
> >> [hidden email]
> >>>>
> >>> wrote:
> >>>
> >>>> Check the log for for an OOM crash.  Fatal exceptions will be in the
> >> main
> >>>> solr log and out of memory errors will be in their own -oom log.
> >>>>
> >>>> I've encountered quite a few solr crashes and usually it's when
> >> there's a
> >>>> threshold of concurrent users and/or indexing happening.
> >>>>
> >>>>
> >>>>
> >>>> On Thu, May 14, 2020, 9:23 AM Ryan W <[hidden email]> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I manage a site where solr has stopped running a couple times in the
> >>> past
> >>>>> week. The server hasn't been rebooted, so that's not the reason.
> >> What
> >>>> else
> >>>>> causes solr to stop running?  How can I investigate why this is
> >>>> happening?
> >>>>>
> >>>>> Thank you,
> >>>>> Ryan
> >>>>>
> >>>>
> >>>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Ryan W
Happened again today. Solr stopped running. Apache hasn't stopped in 10
days, so this is not due to a server reboot.

Solr is not being run with the oom-killer.  And when I grep for ERROR in
the logs, there is nothing from today.

On Mon, May 18, 2020 at 3:15 PM James Greene <[hidden email]>
wrote:

> I usually do a combination of grepping for ERROR in solr logs and checking
> journalctl to see if an external program may have killed the process.
>
>
> Cheers,
>
> /************************************
> *       James Austin Greene
> *  www.jamesaustingreene.com
> *              336-lol-nerd
> ************************************/
>
>
> On Mon, May 18, 2020 at 1:39 PM Erick Erickson <[hidden email]>
> wrote:
>
> > ps aux | grep solr
> >
> > on a *.nix system will show you all the runtime parameters.
> >
> > > On May 18, 2020, at 12:46 PM, Ryan W <[hidden email]> wrote:
> > >
> > > Is there a config file containing the start params?  I run solr like...
> > >
> > > bin/solr start
> > >
> > > I have not seen anything in the logs that seems informative. When I
> grep
> > in
> > > the logs directory for 'memory', I see nothing besides a couple entries
> > > like...
> > >
> > > 2020-05-14 13:05:56.155 INFO  (main) [   ]
> > o.a.s.h.a.MetricsHistoryHandler
> > > No .system collection, keeping metrics history in memory.
> > >
> > > I don't know what that entry means, though the date does roughly
> coincide
> > > with the last time solr stopped running.
> > >
> > > Thank you.
> > >
> > >
> > > On Mon, May 18, 2020 at 12:00 PM Erick Erickson <
> [hidden email]
> > >
> > > wrote:
> > >
> > >> Probably, but check that you are running with the oom-killer, it'll be
> > in
> > >> your start params.
> > >>
> > >> But absent that, something external will be the culprit, Solr doesn't
> > stop
> > >> by itself. Do look at the Solr log once things stop, it should show if
> > >> someone or something stopped it.
> > >>
> > >> On Mon, May 18, 2020, 10:43 Ryan W <[hidden email]> wrote:
> > >>
> > >>> I don't see any log file with "oom" in the file name.  Does that mean
> > >> there
> > >>> hasn't been an out-of-memory issue?  Thanks.
> > >>>
> > >>> On Thu, May 14, 2020 at 10:05 AM James Greene <
> > >> [hidden email]
> > >>>>
> > >>> wrote:
> > >>>
> > >>>> Check the log for for an OOM crash.  Fatal exceptions will be in the
> > >> main
> > >>>> solr log and out of memory errors will be in their own -oom log.
> > >>>>
> > >>>> I've encountered quite a few solr crashes and usually it's when
> > >> there's a
> > >>>> threshold of concurrent users and/or indexing happening.
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Thu, May 14, 2020, 9:23 AM Ryan W <[hidden email]> wrote:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> I manage a site where solr has stopped running a couple times in
> the
> > >>> past
> > >>>>> week. The server hasn't been rebooted, so that's not the reason.
> > >> What
> > >>>> else
> > >>>>> causes solr to stop running?  How can I investigate why this is
> > >>>> happening?
> > >>>>>
> > >>>>> Thank you,
> > >>>>> Ryan
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Radu Gheorghe
Hi Ryan,

If Solr auto-restarts, I suppose it's systemd doing that. When it restarts
the Solr service, systemd should log this (maybe somethibg like: journalctl
--no-pager | grep -i solr).

Then you can go in your Solr logs and check what happened right before that
time. Also, check system logs for what happened before Solr was restarted.

Best regards,
Radu

https://sematext.com/

joi, 4 iun. 2020, 19:24 Ryan W <[hidden email]> a scris:

> Happened again today. Solr stopped running. Apache hasn't stopped in 10
> days, so this is not due to a server reboot.
>
> Solr is not being run with the oom-killer.  And when I grep for ERROR in
> the logs, there is nothing from today.
>
> On Mon, May 18, 2020 at 3:15 PM James Greene <[hidden email]>
> wrote:
>
> > I usually do a combination of grepping for ERROR in solr logs and
> checking
> > journalctl to see if an external program may have killed the process.
> >
> >
> > Cheers,
> >
> > /************************************
> > *       James Austin Greene
> > *  www.jamesaustingreene.com
> > *              336-lol-nerd
> > ************************************/
> >
> >
> > On Mon, May 18, 2020 at 1:39 PM Erick Erickson <[hidden email]>
> > wrote:
> >
> > > ps aux | grep solr
> > >
> > > on a *.nix system will show you all the runtime parameters.
> > >
> > > > On May 18, 2020, at 12:46 PM, Ryan W <[hidden email]> wrote:
> > > >
> > > > Is there a config file containing the start params?  I run solr
> like...
> > > >
> > > > bin/solr start
> > > >
> > > > I have not seen anything in the logs that seems informative. When I
> > grep
> > > in
> > > > the logs directory for 'memory', I see nothing besides a couple
> entries
> > > > like...
> > > >
> > > > 2020-05-14 13:05:56.155 INFO  (main) [   ]
> > > o.a.s.h.a.MetricsHistoryHandler
> > > > No .system collection, keeping metrics history in memory.
> > > >
> > > > I don't know what that entry means, though the date does roughly
> > coincide
> > > > with the last time solr stopped running.
> > > >
> > > > Thank you.
> > > >
> > > >
> > > > On Mon, May 18, 2020 at 12:00 PM Erick Erickson <
> > [hidden email]
> > > >
> > > > wrote:
> > > >
> > > >> Probably, but check that you are running with the oom-killer, it'll
> be
> > > in
> > > >> your start params.
> > > >>
> > > >> But absent that, something external will be the culprit, Solr
> doesn't
> > > stop
> > > >> by itself. Do look at the Solr log once things stop, it should show
> if
> > > >> someone or something stopped it.
> > > >>
> > > >> On Mon, May 18, 2020, 10:43 Ryan W <[hidden email]> wrote:
> > > >>
> > > >>> I don't see any log file with "oom" in the file name.  Does that
> mean
> > > >> there
> > > >>> hasn't been an out-of-memory issue?  Thanks.
> > > >>>
> > > >>> On Thu, May 14, 2020 at 10:05 AM James Greene <
> > > >> [hidden email]
> > > >>>>
> > > >>> wrote:
> > > >>>
> > > >>>> Check the log for for an OOM crash.  Fatal exceptions will be in
> the
> > > >> main
> > > >>>> solr log and out of memory errors will be in their own -oom log.
> > > >>>>
> > > >>>> I've encountered quite a few solr crashes and usually it's when
> > > >> there's a
> > > >>>> threshold of concurrent users and/or indexing happening.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Thu, May 14, 2020, 9:23 AM Ryan W <[hidden email]> wrote:
> > > >>>>
> > > >>>>> Hi all,
> > > >>>>>
> > > >>>>> I manage a site where solr has stopped running a couple times in
> > the
> > > >>> past
> > > >>>>> week. The server hasn't been rebooted, so that's not the reason.
> > > >> What
> > > >>>> else
> > > >>>>> causes solr to stop running?  How can I investigate why this is
> > > >>>> happening?
> > > >>>>>
> > > >>>>> Thank you,
> > > >>>>> Ryan
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Ryan W
"If Solr auto-restarts"

It doesn't auto-restart.  Is there some auto-restart functionality?  I'm
not aware of that.

On Mon, Jun 8, 2020 at 7:10 AM Radu Gheorghe <[hidden email]>
wrote:

> Hi Ryan,
>
> If Solr auto-restarts, I suppose it's systemd doing that. When it restarts
> the Solr service, systemd should log this (maybe somethibg like: journalctl
> --no-pager | grep -i solr).
>
> Then you can go in your Solr logs and check what happened right before that
> time. Also, check system logs for what happened before Solr was restarted.
>
> Best regards,
> Radu
>
> https://sematext.com/
>
> joi, 4 iun. 2020, 19:24 Ryan W <[hidden email]> a scris:
>
> > Happened again today. Solr stopped running. Apache hasn't stopped in 10
> > days, so this is not due to a server reboot.
> >
> > Solr is not being run with the oom-killer.  And when I grep for ERROR in
> > the logs, there is nothing from today.
> >
> > On Mon, May 18, 2020 at 3:15 PM James Greene <
> [hidden email]>
> > wrote:
> >
> > > I usually do a combination of grepping for ERROR in solr logs and
> > checking
> > > journalctl to see if an external program may have killed the process.
> > >
> > >
> > > Cheers,
> > >
> > > /************************************
> > > *       James Austin Greene
> > > *  www.jamesaustingreene.com
> > > *              336-lol-nerd
> > > ************************************/
> > >
> > >
> > > On Mon, May 18, 2020 at 1:39 PM Erick Erickson <
> [hidden email]>
> > > wrote:
> > >
> > > > ps aux | grep solr
> > > >
> > > > on a *.nix system will show you all the runtime parameters.
> > > >
> > > > > On May 18, 2020, at 12:46 PM, Ryan W <[hidden email]> wrote:
> > > > >
> > > > > Is there a config file containing the start params?  I run solr
> > like...
> > > > >
> > > > > bin/solr start
> > > > >
> > > > > I have not seen anything in the logs that seems informative. When I
> > > grep
> > > > in
> > > > > the logs directory for 'memory', I see nothing besides a couple
> > entries
> > > > > like...
> > > > >
> > > > > 2020-05-14 13:05:56.155 INFO  (main) [   ]
> > > > o.a.s.h.a.MetricsHistoryHandler
> > > > > No .system collection, keeping metrics history in memory.
> > > > >
> > > > > I don't know what that entry means, though the date does roughly
> > > coincide
> > > > > with the last time solr stopped running.
> > > > >
> > > > > Thank you.
> > > > >
> > > > >
> > > > > On Mon, May 18, 2020 at 12:00 PM Erick Erickson <
> > > [hidden email]
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Probably, but check that you are running with the oom-killer,
> it'll
> > be
> > > > in
> > > > >> your start params.
> > > > >>
> > > > >> But absent that, something external will be the culprit, Solr
> > doesn't
> > > > stop
> > > > >> by itself. Do look at the Solr log once things stop, it should
> show
> > if
> > > > >> someone or something stopped it.
> > > > >>
> > > > >> On Mon, May 18, 2020, 10:43 Ryan W <[hidden email]> wrote:
> > > > >>
> > > > >>> I don't see any log file with "oom" in the file name.  Does that
> > mean
> > > > >> there
> > > > >>> hasn't been an out-of-memory issue?  Thanks.
> > > > >>>
> > > > >>> On Thu, May 14, 2020 at 10:05 AM James Greene <
> > > > >> [hidden email]
> > > > >>>>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Check the log for for an OOM crash.  Fatal exceptions will be in
> > the
> > > > >> main
> > > > >>>> solr log and out of memory errors will be in their own -oom log.
> > > > >>>>
> > > > >>>> I've encountered quite a few solr crashes and usually it's when
> > > > >> there's a
> > > > >>>> threshold of concurrent users and/or indexing happening.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Thu, May 14, 2020, 9:23 AM Ryan W <[hidden email]> wrote:
> > > > >>>>
> > > > >>>>> Hi all,
> > > > >>>>>
> > > > >>>>> I manage a site where solr has stopped running a couple times
> in
> > > the
> > > > >>> past
> > > > >>>>> week. The server hasn't been rebooted, so that's not the
> reason.
> > > > >> What
> > > > >>>> else
> > > > >>>>> causes solr to stop running?  How can I investigate why this is
> > > > >>>> happening?
> > > > >>>>>
> > > > >>>>> Thank you,
> > > > >>>>> Ryan
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Radu Gheorghe
I assumed it does, based on your description. If you installed it as a service (systemd), then systemd can start the service again if it fails. (something like Restart=always in your [Service] definition).

But if it doesn’t restart automatically now, I think it’s easier to troubleshoot: just check the last logs after it crashed.

Best regards,
Radu

https://sematext.com

> On 8 Jun 2020, at 16:28, Ryan W <[hidden email]> wrote:
>
> "If Solr auto-restarts"
>
> It doesn't auto-restart.  Is there some auto-restart functionality?  I'm
> not aware of that.
>
> On Mon, Jun 8, 2020 at 7:10 AM Radu Gheorghe <[hidden email]>
> wrote:
>
>> Hi Ryan,
>>
>> If Solr auto-restarts, I suppose it's systemd doing that. When it restarts
>> the Solr service, systemd should log this (maybe somethibg like: journalctl
>> --no-pager | grep -i solr).
>>
>> Then you can go in your Solr logs and check what happened right before that
>> time. Also, check system logs for what happened before Solr was restarted.
>>
>> Best regards,
>> Radu
>>
>> https://sematext.com/
>>
>> joi, 4 iun. 2020, 19:24 Ryan W <[hidden email]> a scris:
>>
>>> Happened again today. Solr stopped running. Apache hasn't stopped in 10
>>> days, so this is not due to a server reboot.
>>>
>>> Solr is not being run with the oom-killer.  And when I grep for ERROR in
>>> the logs, there is nothing from today.
>>>
>>> On Mon, May 18, 2020 at 3:15 PM James Greene <
>> [hidden email]>
>>> wrote:
>>>
>>>> I usually do a combination of grepping for ERROR in solr logs and
>>> checking
>>>> journalctl to see if an external program may have killed the process.
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> /************************************
>>>> *       James Austin Greene
>>>> *  www.jamesaustingreene.com
>>>> *              336-lol-nerd
>>>> ************************************/
>>>>
>>>>
>>>> On Mon, May 18, 2020 at 1:39 PM Erick Erickson <
>> [hidden email]>
>>>> wrote:
>>>>
>>>>> ps aux | grep solr
>>>>>
>>>>> on a *.nix system will show you all the runtime parameters.
>>>>>
>>>>>> On May 18, 2020, at 12:46 PM, Ryan W <[hidden email]> wrote:
>>>>>>
>>>>>> Is there a config file containing the start params?  I run solr
>>> like...
>>>>>>
>>>>>> bin/solr start
>>>>>>
>>>>>> I have not seen anything in the logs that seems informative. When I
>>>> grep
>>>>> in
>>>>>> the logs directory for 'memory', I see nothing besides a couple
>>> entries
>>>>>> like...
>>>>>>
>>>>>> 2020-05-14 13:05:56.155 INFO  (main) [   ]
>>>>> o.a.s.h.a.MetricsHistoryHandler
>>>>>> No .system collection, keeping metrics history in memory.
>>>>>>
>>>>>> I don't know what that entry means, though the date does roughly
>>>> coincide
>>>>>> with the last time solr stopped running.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>> On Mon, May 18, 2020 at 12:00 PM Erick Erickson <
>>>> [hidden email]
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Probably, but check that you are running with the oom-killer,
>> it'll
>>> be
>>>>> in
>>>>>>> your start params.
>>>>>>>
>>>>>>> But absent that, something external will be the culprit, Solr
>>> doesn't
>>>>> stop
>>>>>>> by itself. Do look at the Solr log once things stop, it should
>> show
>>> if
>>>>>>> someone or something stopped it.
>>>>>>>
>>>>>>> On Mon, May 18, 2020, 10:43 Ryan W <[hidden email]> wrote:
>>>>>>>
>>>>>>>> I don't see any log file with "oom" in the file name.  Does that
>>> mean
>>>>>>> there
>>>>>>>> hasn't been an out-of-memory issue?  Thanks.
>>>>>>>>
>>>>>>>> On Thu, May 14, 2020 at 10:05 AM James Greene <
>>>>>>> [hidden email]
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Check the log for for an OOM crash.  Fatal exceptions will be in
>>> the
>>>>>>> main
>>>>>>>>> solr log and out of memory errors will be in their own -oom log.
>>>>>>>>>
>>>>>>>>> I've encountered quite a few solr crashes and usually it's when
>>>>>>> there's a
>>>>>>>>> threshold of concurrent users and/or indexing happening.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, May 14, 2020, 9:23 AM Ryan W <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I manage a site where solr has stopped running a couple times
>> in
>>>> the
>>>>>>>> past
>>>>>>>>>> week. The server hasn't been rebooted, so that's not the
>> reason.
>>>>>>> What
>>>>>>>>> else
>>>>>>>>>> causes solr to stop running?  How can I investigate why this is
>>>>>>>>> happening?
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Ryan
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Shawn Heisey-2
In reply to this post by Ryan W
On 5/14/2020 7:22 AM, Ryan W wrote:
> I manage a site where solr has stopped running a couple times in the past
> week. The server hasn't been rebooted, so that's not the reason.  What else
> causes solr to stop running?  How can I investigate why this is happening?

Any situation where Solr stops running and nobody requested the stop is
a result of a serious problem that must be thoroughly investigated.  I
think it's a bad idea for Solr to automatically restart when it stops
unexpectedly.  Chances are that whatever caused the crash is going to
simply make the crash happen again until the problem is solved.
Automatically restarting could hide problems from the system administrator.

The only way a Solr auto-restart would be acceptable to me is if it
sends a high priority alert to the sysadmin EVERY time it executes an
auto-restart.  It really is that bad of a problem.

The causes of Solr crashes (that I can think of) include the following.
I believe I have listed these four options from most likely to least likely:

* Java OutOfMemoryError exceptions.  On non-windows systems, the
"bin/solr" script starts Solr with an option that results in Solr's
death anytime one of these exceptions occurs.  We do this because
program operation is indeterminate and completely unpredictable when
OOME occurs, so it's far safer to stop running.  That exception can be
caused by several things, some of which actually do not involve memory
at all.  If you're running on Windows via the bin\solr.cmd command, then
this will not happen ... but OOME could still cause a crash, because as
I already mentioned, program operation is unpredictable when OOME occurs.

* The OS kills Solr because system memory is completely exhausted and
Solr is the process using the most memory.  Linux calls this the
"oom-killer" ... I am pretty sure something like it exists on most
operating systems.

* Corruption somewhere in the system.  Could be in Java, the OS, Solr,
or data used by any of those.

* A very serious bug in Solr's code that we haven't discovered yet.

I included that last one simply for completeness.  A bug that causes a
crash *COULD* exist, but as of right now, we have not seen any
supporting evidence.

My guess is that Java OutOfMemoryError is the cause here, but I can't be
certain.  If that is happening, then some resource (which might not be
memory) is fully depleted.  We would need to see the full
OutOfMemoryError exception in order to determine why it is happening.
Sometimes the exception is logged in solr.log, sometimes it isn't.  We
cannot predict what part of the code will be running when OOME occurs,
so it would be nearly impossible for us to guarantee logging.  OOME can
happen ANYWHERE - even in code that the compiler thinks is immune to
exceptions.

Side note to fellow committers:  I wonder if we should implement an
uncaught exception handler in Solr.  I have found in my own programs
that it helps figure out thorny problems.  And while I am on the subject
of handlers that might not be general knowledge, I didn't find a
shutdown hook or a security manager outside of tests.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

David Hastings
I’ll add that whenever I’ve had a solr instance shut down, for me it’s been a hardware failure. Either the ram or the disk got a “glitch” and both of these are relatively fragile and wear and tear type parts of the machine, and should be expected to fail and be replaced from time to time. Solr is pretty aggressive with its logging so there are a lot of writes always happening and of course reads, if the disk has any issues or the memory it can lock it up and bring her down, more so if you have any spellcheck dictionaries or suggesters being built on start up.

Just my experience with this, could be wrong (most likely wrong) but we always have extra drives and memory around the server room for this reason.  At least once or twice a year we will have a disk failure in the raid and need to swap in a new one.

Good luck though, also solr should be logging it’s failures so it would be good to look there too

> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <[hidden email]> wrote:
>
> On 5/14/2020 7:22 AM, Ryan W wrote:
>> I manage a site where solr has stopped running a couple times in the past
>> week. The server hasn't been rebooted, so that's not the reason.  What else
>> causes solr to stop running?  How can I investigate why this is happening?
>
> Any situation where Solr stops running and nobody requested the stop is a result of a serious problem that must be thoroughly investigated.  I think it's a bad idea for Solr to automatically restart when it stops unexpectedly.  Chances are that whatever caused the crash is going to simply make the crash happen again until the problem is solved. Automatically restarting could hide problems from the system administrator.
>
> The only way a Solr auto-restart would be acceptable to me is if it sends a high priority alert to the sysadmin EVERY time it executes an auto-restart.  It really is that bad of a problem.
>
> The causes of Solr crashes (that I can think of) include the following. I believe I have listed these four options from most likely to least likely:
>
> * Java OutOfMemoryError exceptions.  On non-windows systems, the "bin/solr" script starts Solr with an option that results in Solr's death anytime one of these exceptions occurs.  We do this because program operation is indeterminate and completely unpredictable when OOME occurs, so it's far safer to stop running.  That exception can be caused by several things, some of which actually do not involve memory at all.  If you're running on Windows via the bin\solr.cmd command, then this will not happen ... but OOME could still cause a crash, because as I already mentioned, program operation is unpredictable when OOME occurs.
>
> * The OS kills Solr because system memory is completely exhausted and Solr is the process using the most memory.  Linux calls this the "oom-killer" ... I am pretty sure something like it exists on most operating systems.
>
> * Corruption somewhere in the system.  Could be in Java, the OS, Solr, or data used by any of those.
>
> * A very serious bug in Solr's code that we haven't discovered yet.
>
> I included that last one simply for completeness.  A bug that causes a crash *COULD* exist, but as of right now, we have not seen any supporting evidence.
>
> My guess is that Java OutOfMemoryError is the cause here, but I can't be certain.  If that is happening, then some resource (which might not be memory) is fully depleted.  We would need to see the full OutOfMemoryError exception in order to determine why it is happening. Sometimes the exception is logged in solr.log, sometimes it isn't.  We cannot predict what part of the code will be running when OOME occurs, so it would be nearly impossible for us to guarantee logging.  OOME can happen ANYWHERE - even in code that the compiler thinks is immune to exceptions.
>
> Side note to fellow committers:  I wonder if we should implement an uncaught exception handler in Solr.  I have found in my own programs that it helps figure out thorny problems.  And while I am on the subject of handlers that might not be general knowledge, I didn't find a shutdown hook or a security manager outside of tests.
>
> Thanks,
> Shawn
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Erick Erickson
To add to what Dave said, if you have a particular machine that’s prone to
suddenly stopping, that’s usually a red flag that you should seriously
think about hardware issues.

If the problem strikes different machines, then I agree with Shawn that
the first thing I’d be suspicious of is OOM errors.

FWIW,
Erick

> On Jun 9, 2020, at 6:05 AM, Dave <[hidden email]> wrote:
>
> I’ll add that whenever I’ve had a solr instance shut down, for me it’s been a hardware failure. Either the ram or the disk got a “glitch” and both of these are relatively fragile and wear and tear type parts of the machine, and should be expected to fail and be replaced from time to time. Solr is pretty aggressive with its logging so there are a lot of writes always happening and of course reads, if the disk has any issues or the memory it can lock it up and bring her down, more so if you have any spellcheck dictionaries or suggesters being built on start up.
>
> Just my experience with this, could be wrong (most likely wrong) but we always have extra drives and memory around the server room for this reason.  At least once or twice a year we will have a disk failure in the raid and need to swap in a new one.
>
> Good luck though, also solr should be logging it’s failures so it would be good to look there too
>
>> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <[hidden email]> wrote:
>>
>> On 5/14/2020 7:22 AM, Ryan W wrote:
>>> I manage a site where solr has stopped running a couple times in the past
>>> week. The server hasn't been rebooted, so that's not the reason.  What else
>>> causes solr to stop running?  How can I investigate why this is happening?
>>
>> Any situation where Solr stops running and nobody requested the stop is a result of a serious problem that must be thoroughly investigated.  I think it's a bad idea for Solr to automatically restart when it stops unexpectedly.  Chances are that whatever caused the crash is going to simply make the crash happen again until the problem is solved. Automatically restarting could hide problems from the system administrator.
>>
>> The only way a Solr auto-restart would be acceptable to me is if it sends a high priority alert to the sysadmin EVERY time it executes an auto-restart.  It really is that bad of a problem.
>>
>> The causes of Solr crashes (that I can think of) include the following. I believe I have listed these four options from most likely to least likely:
>>
>> * Java OutOfMemoryError exceptions.  On non-windows systems, the "bin/solr" script starts Solr with an option that results in Solr's death anytime one of these exceptions occurs.  We do this because program operation is indeterminate and completely unpredictable when OOME occurs, so it's far safer to stop running.  That exception can be caused by several things, some of which actually do not involve memory at all.  If you're running on Windows via the bin\solr.cmd command, then this will not happen ... but OOME could still cause a crash, because as I already mentioned, program operation is unpredictable when OOME occurs.
>>
>> * The OS kills Solr because system memory is completely exhausted and Solr is the process using the most memory.  Linux calls this the "oom-killer" ... I am pretty sure something like it exists on most operating systems.
>>
>> * Corruption somewhere in the system.  Could be in Java, the OS, Solr, or data used by any of those.
>>
>> * A very serious bug in Solr's code that we haven't discovered yet.
>>
>> I included that last one simply for completeness.  A bug that causes a crash *COULD* exist, but as of right now, we have not seen any supporting evidence.
>>
>> My guess is that Java OutOfMemoryError is the cause here, but I can't be certain.  If that is happening, then some resource (which might not be memory) is fully depleted.  We would need to see the full OutOfMemoryError exception in order to determine why it is happening. Sometimes the exception is logged in solr.log, sometimes it isn't.  We cannot predict what part of the code will be running when OOME occurs, so it would be nearly impossible for us to guarantee logging.  OOME can happen ANYWHERE - even in code that the compiler thinks is immune to exceptions.
>>
>> Side note to fellow committers:  I wonder if we should implement an uncaught exception handler in Solr.  I have found in my own programs that it helps figure out thorny problems.  And while I am on the subject of handlers that might not be general knowledge, I didn't find a shutdown hook or a security manager outside of tests.
>>
>> Thanks,
>> Shawn

Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Ryan W
Hi all,

People keep suggesting I check the logs for errors.  What do those errors
look like?  Does anyone have examples of the text of a Solr oom error?  Or
the text of any other errors I should be looking for the next time solr
fails?  Are there phrases I should grep for in the logs?  Should I be
looking in the Solr logs for an OOM error, or in the Apache logs?

There is nothing failing on the server except for solr -- at least not that
I can see.  There is no apparent problem with the hardware or anything else
on the server.  The OS is Red Hat Enterprise Linux. The server has 16 GB of
RAM and hosts one website that does not get a huge amount of traffic.

When the start command is given to solr, does it first check to see if solr
is running, or does it always start solr whether it is already running or
not?

Many thanks!
Ryan


On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson <[hidden email]>
wrote:

> To add to what Dave said, if you have a particular machine that’s prone to
> suddenly stopping, that’s usually a red flag that you should seriously
> think about hardware issues.
>
> If the problem strikes different machines, then I agree with Shawn that
> the first thing I’d be suspicious of is OOM errors.
>
> FWIW,
> Erick
>
> > On Jun 9, 2020, at 6:05 AM, Dave <[hidden email]> wrote:
> >
> > I’ll add that whenever I’ve had a solr instance shut down, for me it’s
> been a hardware failure. Either the ram or the disk got a “glitch” and both
> of these are relatively fragile and wear and tear type parts of the
> machine, and should be expected to fail and be replaced from time to time.
> Solr is pretty aggressive with its logging so there are a lot of writes
> always happening and of course reads, if the disk has any issues or the
> memory it can lock it up and bring her down, more so if you have any
> spellcheck dictionaries or suggesters being built on start up.
> >
> > Just my experience with this, could be wrong (most likely wrong) but we
> always have extra drives and memory around the server room for this
> reason.  At least once or twice a year we will have a disk failure in the
> raid and need to swap in a new one.
> >
> > Good luck though, also solr should be logging it’s failures so it would
> be good to look there too
> >
> >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <[hidden email]> wrote:
> >>
> >> On 5/14/2020 7:22 AM, Ryan W wrote:
> >>> I manage a site where solr has stopped running a couple times in the
> past
> >>> week. The server hasn't been rebooted, so that's not the reason.  What
> else
> >>> causes solr to stop running?  How can I investigate why this is
> happening?
> >>
> >> Any situation where Solr stops running and nobody requested the stop is
> a result of a serious problem that must be thoroughly investigated.  I
> think it's a bad idea for Solr to automatically restart when it stops
> unexpectedly.  Chances are that whatever caused the crash is going to
> simply make the crash happen again until the problem is solved.
> Automatically restarting could hide problems from the system administrator.
> >>
> >> The only way a Solr auto-restart would be acceptable to me is if it
> sends a high priority alert to the sysadmin EVERY time it executes an
> auto-restart.  It really is that bad of a problem.
> >>
> >> The causes of Solr crashes (that I can think of) include the following.
> I believe I have listed these four options from most likely to least likely:
> >>
> >> * Java OutOfMemoryError exceptions.  On non-windows systems, the
> "bin/solr" script starts Solr with an option that results in Solr's death
> anytime one of these exceptions occurs.  We do this because program
> operation is indeterminate and completely unpredictable when OOME occurs,
> so it's far safer to stop running.  That exception can be caused by several
> things, some of which actually do not involve memory at all.  If you're
> running on Windows via the bin\solr.cmd command, then this will not happen
> ... but OOME could still cause a crash, because as I already mentioned,
> program operation is unpredictable when OOME occurs.
> >>
> >> * The OS kills Solr because system memory is completely exhausted and
> Solr is the process using the most memory.  Linux calls this the
> "oom-killer" ... I am pretty sure something like it exists on most
> operating systems.
> >>
> >> * Corruption somewhere in the system.  Could be in Java, the OS, Solr,
> or data used by any of those.
> >>
> >> * A very serious bug in Solr's code that we haven't discovered yet.
> >>
> >> I included that last one simply for completeness.  A bug that causes a
> crash *COULD* exist, but as of right now, we have not seen any supporting
> evidence.
> >>
> >> My guess is that Java OutOfMemoryError is the cause here, but I can't
> be certain.  If that is happening, then some resource (which might not be
> memory) is fully depleted.  We would need to see the full OutOfMemoryError
> exception in order to determine why it is happening. Sometimes the
> exception is logged in solr.log, sometimes it isn't.  We cannot predict
> what part of the code will be running when OOME occurs, so it would be
> nearly impossible for us to guarantee logging.  OOME can happen ANYWHERE -
> even in code that the compiler thinks is immune to exceptions.
> >>
> >> Side note to fellow committers:  I wonder if we should implement an
> uncaught exception handler in Solr.  I have found in my own programs that
> it helps figure out thorny problems.  And while I am on the subject of
> handlers that might not be general knowledge, I didn't find a shutdown hook
> or a security manager outside of tests.
> >>
> >> Thanks,
> >> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Hup Chen
I will check "dmesg" first, to find out any hardware error message.
Then use some system admin tools to monitor that server,
for instance, top, vmstat, lsof, iostat ... or simply install some nice
free monitoring tool into this system, like monit, monitorix, nagios.
Good luck!

________________________________
From: Ryan W <[hidden email]>
Sent: Thursday, June 11, 2020 2:13 AM
To: [hidden email] <[hidden email]>
Subject: Re: How to determine why solr stops running?

Hi all,

People keep suggesting I check the logs for errors.  What do those errors
look like?  Does anyone have examples of the text of a Solr oom error?  Or
the text of any other errors I should be looking for the next time solr
fails?  Are there phrases I should grep for in the logs?  Should I be
looking in the Solr logs for an OOM error, or in the Apache logs?

There is nothing failing on the server except for solr -- at least not that
I can see.  There is no apparent problem with the hardware or anything else
on the server.  The OS is Red Hat Enterprise Linux. The server has 16 GB of
RAM and hosts one website that does not get a huge amount of traffic.

When the start command is given to solr, does it first check to see if solr
is running, or does it always start solr whether it is already running or
not?

Many thanks!
Ryan


On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson <[hidden email]>
wrote:

> To add to what Dave said, if you have a particular machine that’s prone to
> suddenly stopping, that’s usually a red flag that you should seriously
> think about hardware issues.
>
> If the problem strikes different machines, then I agree with Shawn that
> the first thing I’d be suspicious of is OOM errors.
>
> FWIW,
> Erick
>
> > On Jun 9, 2020, at 6:05 AM, Dave <[hidden email]> wrote:
> >
> > I’ll add that whenever I’ve had a solr instance shut down, for me it’s
> been a hardware failure. Either the ram or the disk got a “glitch” and both
> of these are relatively fragile and wear and tear type parts of the
> machine, and should be expected to fail and be replaced from time to time.
> Solr is pretty aggressive with its logging so there are a lot of writes
> always happening and of course reads, if the disk has any issues or the
> memory it can lock it up and bring her down, more so if you have any
> spellcheck dictionaries or suggesters being built on start up.
> >
> > Just my experience with this, could be wrong (most likely wrong) but we
> always have extra drives and memory around the server room for this
> reason.  At least once or twice a year we will have a disk failure in the
> raid and need to swap in a new one.
> >
> > Good luck though, also solr should be logging it’s failures so it would
> be good to look there too
> >
> >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <[hidden email]> wrote:
> >>
> >> On 5/14/2020 7:22 AM, Ryan W wrote:
> >>> I manage a site where solr has stopped running a couple times in the
> past
> >>> week. The server hasn't been rebooted, so that's not the reason.  What
> else
> >>> causes solr to stop running?  How can I investigate why this is
> happening?
> >>
> >> Any situation where Solr stops running and nobody requested the stop is
> a result of a serious problem that must be thoroughly investigated.  I
> think it's a bad idea for Solr to automatically restart when it stops
> unexpectedly.  Chances are that whatever caused the crash is going to
> simply make the crash happen again until the problem is solved.
> Automatically restarting could hide problems from the system administrator.
> >>
> >> The only way a Solr auto-restart would be acceptable to me is if it
> sends a high priority alert to the sysadmin EVERY time it executes an
> auto-restart.  It really is that bad of a problem.
> >>
> >> The causes of Solr crashes (that I can think of) include the following.
> I believe I have listed these four options from most likely to least likely:
> >>
> >> * Java OutOfMemoryError exceptions.  On non-windows systems, the
> "bin/solr" script starts Solr with an option that results in Solr's death
> anytime one of these exceptions occurs.  We do this because program
> operation is indeterminate and completely unpredictable when OOME occurs,
> so it's far safer to stop running.  That exception can be caused by several
> things, some of which actually do not involve memory at all.  If you're
> running on Windows via the bin\solr.cmd command, then this will not happen
> ... but OOME could still cause a crash, because as I already mentioned,
> program operation is unpredictable when OOME occurs.
> >>
> >> * The OS kills Solr because system memory is completely exhausted and
> Solr is the process using the most memory.  Linux calls this the
> "oom-killer" ... I am pretty sure something like it exists on most
> operating systems.
> >>
> >> * Corruption somewhere in the system.  Could be in Java, the OS, Solr,
> or data used by any of those.
> >>
> >> * A very serious bug in Solr's code that we haven't discovered yet.
> >>
> >> I included that last one simply for completeness.  A bug that causes a
> crash *COULD* exist, but as of right now, we have not seen any supporting
> evidence.
> >>
> >> My guess is that Java OutOfMemoryError is the cause here, but I can't
> be certain.  If that is happening, then some resource (which might not be
> memory) is fully depleted.  We would need to see the full OutOfMemoryError
> exception in order to determine why it is happening. Sometimes the
> exception is logged in solr.log, sometimes it isn't.  We cannot predict
> what part of the code will be running when OOME occurs, so it would be
> nearly impossible for us to guarantee logging.  OOME can happen ANYWHERE -
> even in code that the compiler thinks is immune to exceptions.
> >>
> >> Side note to fellow committers:  I wonder if we should implement an
> uncaught exception handler in Solr.  I have found in my own programs that
> it helps figure out thorny problems.  And while I am on the subject of
> handlers that might not be general knowledge, I didn't find a shutdown hook
> or a security manager outside of tests.
> >>
> >> Thanks,
> >> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Shawn Heisey-2
In reply to this post by Ryan W
On 6/10/2020 12:13 PM, Ryan W wrote:
> People keep suggesting I check the logs for errors.  What do those errors
> look like?  Does anyone have examples of the text of a Solr oom error?  Or
> the text of any other errors I should be looking for the next time solr
> fails?  Are there phrases I should grep for in the logs?  Should I be
> looking in the Solr logs for an OOM error, or in the Apache logs?

Are you running Solr on Windows?   If you are, then a Jave OOME will NOT
cause Solr to stop.  On pretty much any other operating system, Solr
will terminate when OOME occurs.  This termination will create a
separate logfile, one that contains very little actual information,
really the only thing it says is that the oom killer script was
executed.  That logfile will have a filename like the following:

solr_oom_killer-8983-2019-08-11_22_57_56.log

If OOME is the reason Solr stops running, then the only place that
exception will be logged is solr.log as far as I know ... but there
exists a very real possibility that it won't actually be logged.  It
could occur at a place in the code that does not have any logging.

At the URL below is an example of a logged OOME on a Solr server.  In
this case, it wasn't memory that was exhausted, the error was logging an
inability to start a new thread:

https://paste.apache.org/aznyg

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Ryan W
In reply to this post by Hup Chen
On Wed, Jun 10, 2020 at 8:35 PM Hup Chen <[hidden email]> wrote:

> I will check "dmesg" first, to find out any hardware error message.
>

Here is what I see toward the end of the output from dmesg:

[1521232.781785] [118857]    48 118857   108785      677     201
901             0 httpd
[1521232.781787] [118860]    48 118860   108785      710     201
881             0 httpd
[1521232.781788] [118862]    48 118862   113063     5256     210
725             0 httpd
[1521232.781790] [118864]    48 118864   114085     6634     212
703             0 httpd
[1521232.781791] [118871]    48 118871   139687    32323     262
620             0 httpd
[1521232.781793] [118873]    48 118873   108785      821     201
792             0 httpd
[1521232.781795] [118879]    48 118879   140263    32719     263
621             0 httpd
[1521232.781796] [118903]    48 118903   108785      812     201
771             0 httpd
[1521232.781798] [118905]    48 118905   113575     5606     211
660             0 httpd
[1521232.781800] [118906]    48 118906   113563     5694     211
626             0 httpd
[1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
sacrifice child
[1521232.782908] Killed process 117529 (httpd), UID 48, total-vm:675824kB,
anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB

Is this a relevant "Out of memory" message?  Does this suggest an OOM
situation is the culprit?

When I grep in the solr logs for oom, I see some entries like this...

./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
-XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
-XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
-XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
-XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
-XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
-XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
-XX:-OmitStackTraceInFastThrow
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs
-XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
-XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
-XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
-XX:+UseParNewGC

Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". But I
think this is just a setting that indicates what to do in case of an OOM.
And if I look in that oom_solr.sh file, I see it would write an entry to a
solr_oom_kill log. And there is no such log in the logs directory.

Many thanks.




> Then use some system admin tools to monitor that server,
> for instance, top, vmstat, lsof, iostat ... or simply install some nice
> free monitoring tool into this system, like monit, monitorix, nagios.
> Good luck!
>
> ________________________________
> From: Ryan W <[hidden email]>
> Sent: Thursday, June 11, 2020 2:13 AM
> To: [hidden email] <[hidden email]>
> Subject: Re: How to determine why solr stops running?
>
> Hi all,
>
> People keep suggesting I check the logs for errors.  What do those errors
> look like?  Does anyone have examples of the text of a Solr oom error?  Or
> the text of any other errors I should be looking for the next time solr
> fails?  Are there phrases I should grep for in the logs?  Should I be
> looking in the Solr logs for an OOM error, or in the Apache logs?
>
> There is nothing failing on the server except for solr -- at least not that
> I can see.  There is no apparent problem with the hardware or anything else
> on the server.  The OS is Red Hat Enterprise Linux. The server has 16 GB of
> RAM and hosts one website that does not get a huge amount of traffic.
>
> When the start command is given to solr, does it first check to see if solr
> is running, or does it always start solr whether it is already running or
> not?
>
> Many thanks!
> Ryan
>
>
> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson <[hidden email]>
> wrote:
>
> > To add to what Dave said, if you have a particular machine that’s prone
> to
> > suddenly stopping, that’s usually a red flag that you should seriously
> > think about hardware issues.
> >
> > If the problem strikes different machines, then I agree with Shawn that
> > the first thing I’d be suspicious of is OOM errors.
> >
> > FWIW,
> > Erick
> >
> > > On Jun 9, 2020, at 6:05 AM, Dave <[hidden email]> wrote:
> > >
> > > I’ll add that whenever I’ve had a solr instance shut down, for me it’s
> > been a hardware failure. Either the ram or the disk got a “glitch” and
> both
> > of these are relatively fragile and wear and tear type parts of the
> > machine, and should be expected to fail and be replaced from time to
> time.
> > Solr is pretty aggressive with its logging so there are a lot of writes
> > always happening and of course reads, if the disk has any issues or the
> > memory it can lock it up and bring her down, more so if you have any
> > spellcheck dictionaries or suggesters being built on start up.
> > >
> > > Just my experience with this, could be wrong (most likely wrong) but we
> > always have extra drives and memory around the server room for this
> > reason.  At least once or twice a year we will have a disk failure in the
> > raid and need to swap in a new one.
> > >
> > > Good luck though, also solr should be logging it’s failures so it would
> > be good to look there too
> > >
> > >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <[hidden email]> wrote:
> > >>
> > >> On 5/14/2020 7:22 AM, Ryan W wrote:
> > >>> I manage a site where solr has stopped running a couple times in the
> > past
> > >>> week. The server hasn't been rebooted, so that's not the reason.
> What
> > else
> > >>> causes solr to stop running?  How can I investigate why this is
> > happening?
> > >>
> > >> Any situation where Solr stops running and nobody requested the stop
> is
> > a result of a serious problem that must be thoroughly investigated.  I
> > think it's a bad idea for Solr to automatically restart when it stops
> > unexpectedly.  Chances are that whatever caused the crash is going to
> > simply make the crash happen again until the problem is solved.
> > Automatically restarting could hide problems from the system
> administrator.
> > >>
> > >> The only way a Solr auto-restart would be acceptable to me is if it
> > sends a high priority alert to the sysadmin EVERY time it executes an
> > auto-restart.  It really is that bad of a problem.
> > >>
> > >> The causes of Solr crashes (that I can think of) include the
> following.
> > I believe I have listed these four options from most likely to least
> likely:
> > >>
> > >> * Java OutOfMemoryError exceptions.  On non-windows systems, the
> > "bin/solr" script starts Solr with an option that results in Solr's death
> > anytime one of these exceptions occurs.  We do this because program
> > operation is indeterminate and completely unpredictable when OOME occurs,
> > so it's far safer to stop running.  That exception can be caused by
> several
> > things, some of which actually do not involve memory at all.  If you're
> > running on Windows via the bin\solr.cmd command, then this will not
> happen
> > ... but OOME could still cause a crash, because as I already mentioned,
> > program operation is unpredictable when OOME occurs.
> > >>
> > >> * The OS kills Solr because system memory is completely exhausted and
> > Solr is the process using the most memory.  Linux calls this the
> > "oom-killer" ... I am pretty sure something like it exists on most
> > operating systems.
> > >>
> > >> * Corruption somewhere in the system.  Could be in Java, the OS, Solr,
> > or data used by any of those.
> > >>
> > >> * A very serious bug in Solr's code that we haven't discovered yet.
> > >>
> > >> I included that last one simply for completeness.  A bug that causes a
> > crash *COULD* exist, but as of right now, we have not seen any supporting
> > evidence.
> > >>
> > >> My guess is that Java OutOfMemoryError is the cause here, but I can't
> > be certain.  If that is happening, then some resource (which might not be
> > memory) is fully depleted.  We would need to see the full
> OutOfMemoryError
> > exception in order to determine why it is happening. Sometimes the
> > exception is logged in solr.log, sometimes it isn't.  We cannot predict
> > what part of the code will be running when OOME occurs, so it would be
> > nearly impossible for us to guarantee logging.  OOME can happen ANYWHERE
> -
> > even in code that the compiler thinks is immune to exceptions.
> > >>
> > >> Side note to fellow committers:  I wonder if we should implement an
> > uncaught exception handler in Solr.  I have found in my own programs that
> > it helps figure out thorny problems.  And while I am on the subject of
> > handlers that might not be general knowledge, I didn't find a shutdown
> hook
> > or a security manager outside of tests.
> > >>
> > >> Thanks,
> > >> Shawn
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Walter Underwood
1. You have a tiny heap. 536 Megabytes is not enough.
2. I stopped using the CMS GC years ago.

Here is the GC config we use on every one of our 150+ Solr hosts. We’re still on Java 8, but will be upgrading soon.

SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Jun 11, 2020, at 10:52 AM, Ryan W <[hidden email]> wrote:
>
> On Wed, Jun 10, 2020 at 8:35 PM Hup Chen <[hidden email]> wrote:
>
>> I will check "dmesg" first, to find out any hardware error message.
>>
>
> Here is what I see toward the end of the output from dmesg:
>
> [1521232.781785] [118857]    48 118857   108785      677     201
> 901             0 httpd
> [1521232.781787] [118860]    48 118860   108785      710     201
> 881             0 httpd
> [1521232.781788] [118862]    48 118862   113063     5256     210
> 725             0 httpd
> [1521232.781790] [118864]    48 118864   114085     6634     212
> 703             0 httpd
> [1521232.781791] [118871]    48 118871   139687    32323     262
> 620             0 httpd
> [1521232.781793] [118873]    48 118873   108785      821     201
> 792             0 httpd
> [1521232.781795] [118879]    48 118879   140263    32719     263
> 621             0 httpd
> [1521232.781796] [118903]    48 118903   108785      812     201
> 771             0 httpd
> [1521232.781798] [118905]    48 118905   113575     5606     211
> 660             0 httpd
> [1521232.781800] [118906]    48 118906   113563     5694     211
> 626             0 httpd
> [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
> sacrifice child
> [1521232.782908] Killed process 117529 (httpd), UID 48, total-vm:675824kB,
> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
>
> Is this a relevant "Out of memory" message?  Does this suggest an OOM
> situation is the culprit?
>
> When I grep in the solr logs for oom, I see some entries like this...
>
> ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
> -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
> -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
> -XX:-OmitStackTraceInFastThrow
> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /opt/solr/server/logs
> -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
> -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
> -XX:+UseParNewGC
>
> Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh". But I
> think this is just a setting that indicates what to do in case of an OOM.
> And if I look in that oom_solr.sh file, I see it would write an entry to a
> solr_oom_kill log. And there is no such log in the logs directory.
>
> Many thanks.
>
>
>
>
>> Then use some system admin tools to monitor that server,
>> for instance, top, vmstat, lsof, iostat ... or simply install some nice
>> free monitoring tool into this system, like monit, monitorix, nagios.
>> Good luck!
>>
>> ________________________________
>> From: Ryan W <[hidden email]>
>> Sent: Thursday, June 11, 2020 2:13 AM
>> To: [hidden email] <[hidden email]>
>> Subject: Re: How to determine why solr stops running?
>>
>> Hi all,
>>
>> People keep suggesting I check the logs for errors.  What do those errors
>> look like?  Does anyone have examples of the text of a Solr oom error?  Or
>> the text of any other errors I should be looking for the next time solr
>> fails?  Are there phrases I should grep for in the logs?  Should I be
>> looking in the Solr logs for an OOM error, or in the Apache logs?
>>
>> There is nothing failing on the server except for solr -- at least not that
>> I can see.  There is no apparent problem with the hardware or anything else
>> on the server.  The OS is Red Hat Enterprise Linux. The server has 16 GB of
>> RAM and hosts one website that does not get a huge amount of traffic.
>>
>> When the start command is given to solr, does it first check to see if solr
>> is running, or does it always start solr whether it is already running or
>> not?
>>
>> Many thanks!
>> Ryan
>>
>>
>> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson <[hidden email]>
>> wrote:
>>
>>> To add to what Dave said, if you have a particular machine that’s prone
>> to
>>> suddenly stopping, that’s usually a red flag that you should seriously
>>> think about hardware issues.
>>>
>>> If the problem strikes different machines, then I agree with Shawn that
>>> the first thing I’d be suspicious of is OOM errors.
>>>
>>> FWIW,
>>> Erick
>>>
>>>> On Jun 9, 2020, at 6:05 AM, Dave <[hidden email]> wrote:
>>>>
>>>> I’ll add that whenever I’ve had a solr instance shut down, for me it’s
>>> been a hardware failure. Either the ram or the disk got a “glitch” and
>> both
>>> of these are relatively fragile and wear and tear type parts of the
>>> machine, and should be expected to fail and be replaced from time to
>> time.
>>> Solr is pretty aggressive with its logging so there are a lot of writes
>>> always happening and of course reads, if the disk has any issues or the
>>> memory it can lock it up and bring her down, more so if you have any
>>> spellcheck dictionaries or suggesters being built on start up.
>>>>
>>>> Just my experience with this, could be wrong (most likely wrong) but we
>>> always have extra drives and memory around the server room for this
>>> reason.  At least once or twice a year we will have a disk failure in the
>>> raid and need to swap in a new one.
>>>>
>>>> Good luck though, also solr should be logging it’s failures so it would
>>> be good to look there too
>>>>
>>>>> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <[hidden email]> wrote:
>>>>>
>>>>> On 5/14/2020 7:22 AM, Ryan W wrote:
>>>>>> I manage a site where solr has stopped running a couple times in the
>>> past
>>>>>> week. The server hasn't been rebooted, so that's not the reason.
>> What
>>> else
>>>>>> causes solr to stop running?  How can I investigate why this is
>>> happening?
>>>>>
>>>>> Any situation where Solr stops running and nobody requested the stop
>> is
>>> a result of a serious problem that must be thoroughly investigated.  I
>>> think it's a bad idea for Solr to automatically restart when it stops
>>> unexpectedly.  Chances are that whatever caused the crash is going to
>>> simply make the crash happen again until the problem is solved.
>>> Automatically restarting could hide problems from the system
>> administrator.
>>>>>
>>>>> The only way a Solr auto-restart would be acceptable to me is if it
>>> sends a high priority alert to the sysadmin EVERY time it executes an
>>> auto-restart.  It really is that bad of a problem.
>>>>>
>>>>> The causes of Solr crashes (that I can think of) include the
>> following.
>>> I believe I have listed these four options from most likely to least
>> likely:
>>>>>
>>>>> * Java OutOfMemoryError exceptions.  On non-windows systems, the
>>> "bin/solr" script starts Solr with an option that results in Solr's death
>>> anytime one of these exceptions occurs.  We do this because program
>>> operation is indeterminate and completely unpredictable when OOME occurs,
>>> so it's far safer to stop running.  That exception can be caused by
>> several
>>> things, some of which actually do not involve memory at all.  If you're
>>> running on Windows via the bin\solr.cmd command, then this will not
>> happen
>>> ... but OOME could still cause a crash, because as I already mentioned,
>>> program operation is unpredictable when OOME occurs.
>>>>>
>>>>> * The OS kills Solr because system memory is completely exhausted and
>>> Solr is the process using the most memory.  Linux calls this the
>>> "oom-killer" ... I am pretty sure something like it exists on most
>>> operating systems.
>>>>>
>>>>> * Corruption somewhere in the system.  Could be in Java, the OS, Solr,
>>> or data used by any of those.
>>>>>
>>>>> * A very serious bug in Solr's code that we haven't discovered yet.
>>>>>
>>>>> I included that last one simply for completeness.  A bug that causes a
>>> crash *COULD* exist, but as of right now, we have not seen any supporting
>>> evidence.
>>>>>
>>>>> My guess is that Java OutOfMemoryError is the cause here, but I can't
>>> be certain.  If that is happening, then some resource (which might not be
>>> memory) is fully depleted.  We would need to see the full
>> OutOfMemoryError
>>> exception in order to determine why it is happening. Sometimes the
>>> exception is logged in solr.log, sometimes it isn't.  We cannot predict
>>> what part of the code will be running when OOME occurs, so it would be
>>> nearly impossible for us to guarantee logging.  OOME can happen ANYWHERE
>> -
>>> even in code that the compiler thinks is immune to exceptions.
>>>>>
>>>>> Side note to fellow committers:  I wonder if we should implement an
>>> uncaught exception handler in Solr.  I have found in my own programs that
>>> it helps figure out thorny problems.  And while I am on the subject of
>>> handlers that might not be general knowledge, I didn't find a shutdown
>> hook
>>> or a security manager outside of tests.
>>>>>
>>>>> Thanks,
>>>>> Shawn
>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: How to determine why solr stops running?

Ryan W
Thank you.  I pasted those settings at the end of my /etc/default/solr.in.sh
just now and restarted solr.  I will see if that fixes it.  Previously, I
had no settings at all in solr.in.sh except for SOLR_PORT.

On Thu, Jun 11, 2020 at 1:59 PM Walter Underwood <[hidden email]>
wrote:

> 1. You have a tiny heap. 536 Megabytes is not enough.
> 2. I stopped using the CMS GC years ago.
>
> Here is the GC config we use on every one of our 150+ Solr hosts. We’re
> still on Java 8, but will be upgrading soon.
>
> SOLR_HEAP=8g
> # Use G1 GC  -- wunder 2017-01-23
> # Settings from https://wiki.apache.org/solr/ShawnHeisey
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
> > On Jun 11, 2020, at 10:52 AM, Ryan W <[hidden email]> wrote:
> >
> > On Wed, Jun 10, 2020 at 8:35 PM Hup Chen <[hidden email]> wrote:
> >
> >> I will check "dmesg" first, to find out any hardware error message.
> >>
> >
> > Here is what I see toward the end of the output from dmesg:
> >
> > [1521232.781785] [118857]    48 118857   108785      677     201
> > 901             0 httpd
> > [1521232.781787] [118860]    48 118860   108785      710     201
> > 881             0 httpd
> > [1521232.781788] [118862]    48 118862   113063     5256     210
> > 725             0 httpd
> > [1521232.781790] [118864]    48 118864   114085     6634     212
> > 703             0 httpd
> > [1521232.781791] [118871]    48 118871   139687    32323     262
> > 620             0 httpd
> > [1521232.781793] [118873]    48 118873   108785      821     201
> > 792             0 httpd
> > [1521232.781795] [118879]    48 118879   140263    32719     263
> > 621             0 httpd
> > [1521232.781796] [118903]    48 118903   108785      812     201
> > 771             0 httpd
> > [1521232.781798] [118905]    48 118905   113575     5606     211
> > 660             0 httpd
> > [1521232.781800] [118906]    48 118906   113563     5694     211
> > 626             0 httpd
> > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
> > sacrifice child
> > [1521232.782908] Killed process 117529 (httpd), UID 48,
> total-vm:675824kB,
> > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> >
> > Is this a relevant "Out of memory" message?  Does this suggest an OOM
> > situation is the culprit?
> >
> > When I grep in the solr logs for oom, I see some entries like this...
> >
> > ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
> > -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTime=6000
> > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> > -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
> > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> > -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
> > -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
> > -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
> > -XX:-OmitStackTraceInFastThrow
> > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
> /opt/solr/server/logs
> > -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
> > -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
> > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> > -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
> > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
> > -XX:+UseParNewGC
> >
> > Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh".
> But I
> > think this is just a setting that indicates what to do in case of an OOM.
> > And if I look in that oom_solr.sh file, I see it would write an entry to
> a
> > solr_oom_kill log. And there is no such log in the logs directory.
> >
> > Many thanks.
> >
> >
> >
> >
> >> Then use some system admin tools to monitor that server,
> >> for instance, top, vmstat, lsof, iostat ... or simply install some nice
> >> free monitoring tool into this system, like monit, monitorix, nagios.
> >> Good luck!
> >>
> >> ________________________________
> >> From: Ryan W <[hidden email]>
> >> Sent: Thursday, June 11, 2020 2:13 AM
> >> To: [hidden email] <[hidden email]>
> >> Subject: Re: How to determine why solr stops running?
> >>
> >> Hi all,
> >>
> >> People keep suggesting I check the logs for errors.  What do those
> errors
> >> look like?  Does anyone have examples of the text of a Solr oom error?
> Or
> >> the text of any other errors I should be looking for the next time solr
> >> fails?  Are there phrases I should grep for in the logs?  Should I be
> >> looking in the Solr logs for an OOM error, or in the Apache logs?
> >>
> >> There is nothing failing on the server except for solr -- at least not
> that
> >> I can see.  There is no apparent problem with the hardware or anything
> else
> >> on the server.  The OS is Red Hat Enterprise Linux. The server has 16
> GB of
> >> RAM and hosts one website that does not get a huge amount of traffic.
> >>
> >> When the start command is given to solr, does it first check to see if
> solr
> >> is running, or does it always start solr whether it is already running
> or
> >> not?
> >>
> >> Many thanks!
> >> Ryan
> >>
> >>
> >> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson <[hidden email]>
> >> wrote:
> >>
> >>> To add to what Dave said, if you have a particular machine that’s prone
> >> to
> >>> suddenly stopping, that’s usually a red flag that you should seriously
> >>> think about hardware issues.
> >>>
> >>> If the problem strikes different machines, then I agree with Shawn that
> >>> the first thing I’d be suspicious of is OOM errors.
> >>>
> >>> FWIW,
> >>> Erick
> >>>
> >>>> On Jun 9, 2020, at 6:05 AM, Dave <[hidden email]>
> wrote:
> >>>>
> >>>> I’ll add that whenever I’ve had a solr instance shut down, for me it’s
> >>> been a hardware failure. Either the ram or the disk got a “glitch” and
> >> both
> >>> of these are relatively fragile and wear and tear type parts of the
> >>> machine, and should be expected to fail and be replaced from time to
> >> time.
> >>> Solr is pretty aggressive with its logging so there are a lot of writes
> >>> always happening and of course reads, if the disk has any issues or the
> >>> memory it can lock it up and bring her down, more so if you have any
> >>> spellcheck dictionaries or suggesters being built on start up.
> >>>>
> >>>> Just my experience with this, could be wrong (most likely wrong) but
> we
> >>> always have extra drives and memory around the server room for this
> >>> reason.  At least once or twice a year we will have a disk failure in
> the
> >>> raid and need to swap in a new one.
> >>>>
> >>>> Good luck though, also solr should be logging it’s failures so it
> would
> >>> be good to look there too
> >>>>
> >>>>> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <[hidden email]>
> wrote:
> >>>>>
> >>>>> On 5/14/2020 7:22 AM, Ryan W wrote:
> >>>>>> I manage a site where solr has stopped running a couple times in the
> >>> past
> >>>>>> week. The server hasn't been rebooted, so that's not the reason.
> >> What
> >>> else
> >>>>>> causes solr to stop running?  How can I investigate why this is
> >>> happening?
> >>>>>
> >>>>> Any situation where Solr stops running and nobody requested the stop
> >> is
> >>> a result of a serious problem that must be thoroughly investigated.  I
> >>> think it's a bad idea for Solr to automatically restart when it stops
> >>> unexpectedly.  Chances are that whatever caused the crash is going to
> >>> simply make the crash happen again until the problem is solved.
> >>> Automatically restarting could hide problems from the system
> >> administrator.
> >>>>>
> >>>>> The only way a Solr auto-restart would be acceptable to me is if it
> >>> sends a high priority alert to the sysadmin EVERY time it executes an
> >>> auto-restart.  It really is that bad of a problem.
> >>>>>
> >>>>> The causes of Solr crashes (that I can think of) include the
> >> following.
> >>> I believe I have listed these four options from most likely to least
> >> likely:
> >>>>>
> >>>>> * Java OutOfMemoryError exceptions.  On non-windows systems, the
> >>> "bin/solr" script starts Solr with an option that results in Solr's
> death
> >>> anytime one of these exceptions occurs.  We do this because program
> >>> operation is indeterminate and completely unpredictable when OOME
> occurs,
> >>> so it's far safer to stop running.  That exception can be caused by
> >> several
> >>> things, some of which actually do not involve memory at all.  If you're
> >>> running on Windows via the bin\solr.cmd command, then this will not
> >> happen
> >>> ... but OOME could still cause a crash, because as I already mentioned,
> >>> program operation is unpredictable when OOME occurs.
> >>>>>
> >>>>> * The OS kills Solr because system memory is completely exhausted and
> >>> Solr is the process using the most memory.  Linux calls this the
> >>> "oom-killer" ... I am pretty sure something like it exists on most
> >>> operating systems.
> >>>>>
> >>>>> * Corruption somewhere in the system.  Could be in Java, the OS,
> Solr,
> >>> or data used by any of those.
> >>>>>
> >>>>> * A very serious bug in Solr's code that we haven't discovered yet.
> >>>>>
> >>>>> I included that last one simply for completeness.  A bug that causes
> a
> >>> crash *COULD* exist, but as of right now, we have not seen any
> supporting
> >>> evidence.
> >>>>>
> >>>>> My guess is that Java OutOfMemoryError is the cause here, but I can't
> >>> be certain.  If that is happening, then some resource (which might not
> be
> >>> memory) is fully depleted.  We would need to see the full
> >> OutOfMemoryError
> >>> exception in order to determine why it is happening. Sometimes the
> >>> exception is logged in solr.log, sometimes it isn't.  We cannot predict
> >>> what part of the code will be running when OOME occurs, so it would be
> >>> nearly impossible for us to guarantee logging.  OOME can happen
> ANYWHERE
> >> -
> >>> even in code that the compiler thinks is immune to exceptions.
> >>>>>
> >>>>> Side note to fellow committers:  I wonder if we should implement an
> >>> uncaught exception handler in Solr.  I have found in my own programs
> that
> >>> it helps figure out thorny problems.  And while I am on the subject of
> >>> handlers that might not be general knowledge, I didn't find a shutdown
> >> hook
> >>> or a security manager outside of tests.
> >>>>>
> >>>>> Thanks,
> >>>>> Shawn
> >>>
> >>>
> >>
>
>
12