SOLR 7.0 DIH out of memory issue with sqlserver

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

SOLR 7.0 DIH out of memory issue with sqlserver

Tanya Bompi
Hi,
  I have the SOLR 7.0 setup with the DataImportHandler connecting to the
sql server db. I keep getting OutOfMemory: Java Heap Space when doing a
full import. The size of the records is around 3 million so not very huge.
I tried the following steps and nothing helped thus far.

1. Setting the "responseBuffering=adaptive;selectMethod=Cursor" in the jdbc
connection string.
2. Setting the batchSize="-1" which hasnt helped
3. Increasing the heap size at solr startup by issuing the command \solr
start -m 1024m -p 8983
Increasing the heap size further doesnt start SOLR instance itself.

I am wondering what could be causing the issue and how to resolve this.
Below is the data-config :

<dataConfig>
  <dataSource type="JdbcDataSource"
              driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"

url="jdbc:sqlserver://SQL10;IntegratedSecurity=True;databaseName=SnapshotData;responseBuffering=adaptive;selectMethod=Cursor"

  user="yyyy"
              />
  <document>

Thanks,
Tanya
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 7.0 DIH out of memory issue with sqlserver

Shawn Heisey-2
On 9/18/2018 4:48 PM, Tanya Bompi wrote:
>    I have the SOLR 7.0 setup with the DataImportHandler connecting to the
> sql server db. I keep getting OutOfMemory: Java Heap Space when doing a
> full import. The size of the records is around 3 million so not very huge.
> I tried the following steps and nothing helped thus far.

See this wiki page:

https://wiki.apache.org/solr/DataImportHandlerFaq

You already have the suggested fix -- setting responseBuffering to
adaptive.  You might try upgrading the driver.  If that doesn't work,
you're probably going to need to talk to Microsoft about what you need
to do differently on the JDBC url.

I did find this page:

https://docs.microsoft.com/en-us/sql/connect/jdbc/using-adaptive-buffering?view=sql-server-2017

This says that when using adaptive buffering, you should avoid using
selectMethod=cursor.  So you should try removing that parameter.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: SOLR 7.0 DIH out of memory issue with sqlserver

Tanya Bompi
Hi,
  I am using the Microsoft Jdbc driver 6.4 version in Solr 7.4.0 . I have
tried removing the selectMethod=Cursor and still it runs out of heap space.
Do we have anyone who has faced similar issue.

Thanks
Tanya


On Tue, Sep 18, 2018 at 6:38 PM Shawn Heisey <[hidden email]> wrote:

> On 9/18/2018 4:48 PM, Tanya Bompi wrote:
> >    I have the SOLR 7.0 setup with the DataImportHandler connecting to the
> > sql server db. I keep getting OutOfMemory: Java Heap Space when doing a
> > full import. The size of the records is around 3 million so not very
> huge.
> > I tried the following steps and nothing helped thus far.
>
> See this wiki page:
>
> https://wiki.apache.org/solr/DataImportHandlerFaq
>
> You already have the suggested fix -- setting responseBuffering to
> adaptive.  You might try upgrading the driver.  If that doesn't work,
> you're probably going to need to talk to Microsoft about what you need
> to do differently on the JDBC url.
>
> I did find this page:
>
>
> https://docs.microsoft.com/en-us/sql/connect/jdbc/using-adaptive-buffering?view=sql-server-2017
>
> This says that when using adaptive buffering, you should avoid using
> selectMethod=cursor.  So you should try removing that parameter.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 7.0 DIH out of memory issue with sqlserver

Erick Erickson
Has this ever worked? IOW, is this something that's changed or has
just never worked?

The obvious first step is to start Solr with more than 1G of memory.
Solr _likes_ memory and a 1G heap is quite small. But you say:
"Increasing the heap size further doesnt start SOLR instance itself.".
How much RAM do you have on your machine? What other programs are
running? You should be able to increase the heap and start Solr if you
have the RAM on your machine so I'd figure out what's behind that
issue first. I regularly start Solr with 16 or 32G of memory on my
local machines, I know of installations running Solr with 60G heaps so
this points to something really odd about your environment.

When you "increase it further", exactly _how_ does Solr fail to start?
What appears in the Solr logs? etc. Really, troubleshoot that issue
first I'd recommend.

If DIH still needs a ridiculous amount of memory, it's usually the
JDBC driver trying to read all the rows into memory at once and you'll
have to explore the jdbc driver settings in detail.

Best,
Erick
On Tue, Sep 18, 2018 at 11:16 PM Tanya Bompi <[hidden email]> wrote:

>
> Hi,
>   I am using the Microsoft Jdbc driver 6.4 version in Solr 7.4.0 . I have
> tried removing the selectMethod=Cursor and still it runs out of heap space.
> Do we have anyone who has faced similar issue.
>
> Thanks
> Tanya
>
>
> On Tue, Sep 18, 2018 at 6:38 PM Shawn Heisey <[hidden email]> wrote:
>
> > On 9/18/2018 4:48 PM, Tanya Bompi wrote:
> > >    I have the SOLR 7.0 setup with the DataImportHandler connecting to the
> > > sql server db. I keep getting OutOfMemory: Java Heap Space when doing a
> > > full import. The size of the records is around 3 million so not very
> > huge.
> > > I tried the following steps and nothing helped thus far.
> >
> > See this wiki page:
> >
> > https://wiki.apache.org/solr/DataImportHandlerFaq
> >
> > You already have the suggested fix -- setting responseBuffering to
> > adaptive.  You might try upgrading the driver.  If that doesn't work,
> > you're probably going to need to talk to Microsoft about what you need
> > to do differently on the JDBC url.
> >
> > I did find this page:
> >
> >
> > https://docs.microsoft.com/en-us/sql/connect/jdbc/using-adaptive-buffering?view=sql-server-2017
> >
> > This says that when using adaptive buffering, you should avoid using
> > selectMethod=cursor.  So you should try removing that parameter.
> >
> > Thanks,
> > Shawn
> >
> >
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 7.0 DIH out of memory issue with sqlserver

Tanya Bompi
Hi Erick,
  Thank you for the follow-up. I have resolved the issue with the increase
in heapSize and I am able to set the SOLR VM to initialize with a  3G heap
size and the subset of 1 mil records was fetched successfully. Although it
fails with the entire 3 mil records. So something is off with the adaptive
buffering setting as I see  it is not helping. And I also set the
autosoftcommit param. I might have to increase the heap size further to see
if it helps. I will keep posted if my issue doesnt resolve.

Thanks,
Tanya

On Wed, Sep 19, 2018 at 8:22 AM Erick Erickson <[hidden email]>
wrote:

> Has this ever worked? IOW, is this something that's changed or has
> just never worked?
>
> The obvious first step is to start Solr with more than 1G of memory.
> Solr _likes_ memory and a 1G heap is quite small. But you say:
> "Increasing the heap size further doesnt start SOLR instance itself.".
> How much RAM do you have on your machine? What other programs are
> running? You should be able to increase the heap and start Solr if you
> have the RAM on your machine so I'd figure out what's behind that
> issue first. I regularly start Solr with 16 or 32G of memory on my
> local machines, I know of installations running Solr with 60G heaps so
> this points to something really odd about your environment.
>
> When you "increase it further", exactly _how_ does Solr fail to start?
> What appears in the Solr logs? etc. Really, troubleshoot that issue
> first I'd recommend.
>
> If DIH still needs a ridiculous amount of memory, it's usually the
> JDBC driver trying to read all the rows into memory at once and you'll
> have to explore the jdbc driver settings in detail.
>
> Best,
> Erick
> On Tue, Sep 18, 2018 at 11:16 PM Tanya Bompi <[hidden email]>
> wrote:
> >
> > Hi,
> >   I am using the Microsoft Jdbc driver 6.4 version in Solr 7.4.0 . I have
> > tried removing the selectMethod=Cursor and still it runs out of heap
> space.
> > Do we have anyone who has faced similar issue.
> >
> > Thanks
> > Tanya
> >
> >
> > On Tue, Sep 18, 2018 at 6:38 PM Shawn Heisey <[hidden email]>
> wrote:
> >
> > > On 9/18/2018 4:48 PM, Tanya Bompi wrote:
> > > >    I have the SOLR 7.0 setup with the DataImportHandler connecting
> to the
> > > > sql server db. I keep getting OutOfMemory: Java Heap Space when
> doing a
> > > > full import. The size of the records is around 3 million so not very
> > > huge.
> > > > I tried the following steps and nothing helped thus far.
> > >
> > > See this wiki page:
> > >
> > > https://wiki.apache.org/solr/DataImportHandlerFaq
> > >
> > > You already have the suggested fix -- setting responseBuffering to
> > > adaptive.  You might try upgrading the driver.  If that doesn't work,
> > > you're probably going to need to talk to Microsoft about what you need
> > > to do differently on the JDBC url.
> > >
> > > I did find this page:
> > >
> > >
> > >
> https://docs.microsoft.com/en-us/sql/connect/jdbc/using-adaptive-buffering?view=sql-server-2017
> > >
> > > This says that when using adaptive buffering, you should avoid using
> > > selectMethod=cursor.  So you should try removing that parameter.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 7.0 DIH out of memory issue with sqlserver

Erick Erickson
Tanya:

Good to hear. You probably want to configure hard <autoCommit> as
well, and in your case perhaps with openSearcher=true

Indexing is only half the problem. It's quite possible that what's
happening is your index is just growing and that's pushing the
boundaries of Java heap. What I'm thinking is that DIH may be
importing small batches, but as more docs are added the underlying
index grows by enough to hit an OOM with batch N+1.

You could just move the indexing off to a client which would give you
a way to see which it is (index growth or jdbc config). Here's a
sample:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick
On Wed, Sep 19, 2018 at 1:22 PM Tanya Bompi <[hidden email]> wrote:

>
> Hi Erick,
>   Thank you for the follow-up. I have resolved the issue with the increase
> in heapSize and I am able to set the SOLR VM to initialize with a  3G heap
> size and the subset of 1 mil records was fetched successfully. Although it
> fails with the entire 3 mil records. So something is off with the adaptive
> buffering setting as I see  it is not helping. And I also set the
> autosoftcommit param. I might have to increase the heap size further to see
> if it helps. I will keep posted if my issue doesnt resolve.
>
> Thanks,
> Tanya
>
> On Wed, Sep 19, 2018 at 8:22 AM Erick Erickson <[hidden email]>
> wrote:
>
> > Has this ever worked? IOW, is this something that's changed or has
> > just never worked?
> >
> > The obvious first step is to start Solr with more than 1G of memory.
> > Solr _likes_ memory and a 1G heap is quite small. But you say:
> > "Increasing the heap size further doesnt start SOLR instance itself.".
> > How much RAM do you have on your machine? What other programs are
> > running? You should be able to increase the heap and start Solr if you
> > have the RAM on your machine so I'd figure out what's behind that
> > issue first. I regularly start Solr with 16 or 32G of memory on my
> > local machines, I know of installations running Solr with 60G heaps so
> > this points to something really odd about your environment.
> >
> > When you "increase it further", exactly _how_ does Solr fail to start?
> > What appears in the Solr logs? etc. Really, troubleshoot that issue
> > first I'd recommend.
> >
> > If DIH still needs a ridiculous amount of memory, it's usually the
> > JDBC driver trying to read all the rows into memory at once and you'll
> > have to explore the jdbc driver settings in detail.
> >
> > Best,
> > Erick
> > On Tue, Sep 18, 2018 at 11:16 PM Tanya Bompi <[hidden email]>
> > wrote:
> > >
> > > Hi,
> > >   I am using the Microsoft Jdbc driver 6.4 version in Solr 7.4.0 . I have
> > > tried removing the selectMethod=Cursor and still it runs out of heap
> > space.
> > > Do we have anyone who has faced similar issue.
> > >
> > > Thanks
> > > Tanya
> > >
> > >
> > > On Tue, Sep 18, 2018 at 6:38 PM Shawn Heisey <[hidden email]>
> > wrote:
> > >
> > > > On 9/18/2018 4:48 PM, Tanya Bompi wrote:
> > > > >    I have the SOLR 7.0 setup with the DataImportHandler connecting
> > to the
> > > > > sql server db. I keep getting OutOfMemory: Java Heap Space when
> > doing a
> > > > > full import. The size of the records is around 3 million so not very
> > > > huge.
> > > > > I tried the following steps and nothing helped thus far.
> > > >
> > > > See this wiki page:
> > > >
> > > > https://wiki.apache.org/solr/DataImportHandlerFaq
> > > >
> > > > You already have the suggested fix -- setting responseBuffering to
> > > > adaptive.  You might try upgrading the driver.  If that doesn't work,
> > > > you're probably going to need to talk to Microsoft about what you need
> > > > to do differently on the JDBC url.
> > > >
> > > > I did find this page:
> > > >
> > > >
> > > >
> > https://docs.microsoft.com/en-us/sql/connect/jdbc/using-adaptive-buffering?view=sql-server-2017
> > > >
> > > > This says that when using adaptive buffering, you should avoid using
> > > > selectMethod=cursor.  So you should try removing that parameter.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> >