Sql entity processor sortedmapbackedcache out of memory issue

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Sql entity processor sortedmapbackedcache out of memory issue

Srinivas Kashyap-2
Hello,

I'm using DIH to index the data and the structure of the DIH is like below for solr core:

<entity>
16 child entities
</entity>

During indexing, since the number of requests being made to database was high(to process one document 17 queries) and was utilizing most of connections of database thereby blocking our web application.

To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to reduce the number of requests to database.

<entity name="parententity" pk="PQRS"
                                                query="SELECT PQRS,PARENT_KEY,L,M,N,O FROM DEF"
                                                >

                                                <field name="L" column="L" />
                                                <field name="M" column="M" />
                                                <field name="N" column="N" />

                                                <entity name="childentity1" pk="PQRS"
                                                                query="SELECT A,B,C,D,E,F,CHILD_KEY,MODIFY_TS FROM ABC ORDER BY MODIFY_TS DESC"
                                                                processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache" where="CHILD_KEY=parententity.PARENT_KEY"
                                                                >

                                                                <field name="A" column="A" />
                                                                <field name="B" column="B" />
                                                </entity>
                                                .
                                                .
                                                .
                                                .
                                                .
                                                .
                                                .
</entity>

We have 8GB Physical memory system(RAM) with 5GB of it allocated to JVM and when we do full-import, only 17 requests are made to database. However, it is shooting up memory consumption and is making the JVM out of memory. Out of memory is happening depending on the number of records each entity is bringing in to the memory. For Dev and QA environments, the above memory config is sufficient. When we move to production, we have to increase the memory to around 16GB of RAM and 12 GB of JVM.

Is there any logic/configurations to limit the memory usage?

Thanks and Regards,
Srinivas Kashyap

________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way.
No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.
Reply | Threaded
Open this post in threaded view
|

Re: Sql entity processor sortedmapbackedcache out of memory issue

Shawn Heisey-2
On 4/8/2019 11:47 PM, Srinivas Kashyap wrote:
> I'm using DIH to index the data and the structure of the DIH is like below for solr core:
>
> <entity>
> 16 child entities
> </entity>
>
> During indexing, since the number of requests being made to database was high(to process one document 17 queries) and was utilizing most of connections of database thereby blocking our web application.

If you have 17 entities, then one document will indeed take 17 queries.
That's the nature of multiple DIH entities.

> To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to reduce the number of requests to database.

When you use SortedMapBackedCache on an entity, you are asking Solr to
store the results of the entire query in memory, even if you don't need
all of the results.  If the database has a lot of rows, that's going to
take a lot of memory.

In your excerpt from the config, your inner entity doesn't have a WHERE
clause.  Which means that it's going to retrieve all of the rows of the
ABC table for *EVERY* single entry in the DEF table.  That's going to be
exceptionally slow.  Normally the SQL query on inner entities will have
some kind of WHERE clause that limits the results to rows that match the
entry from the outer entity.

You may need to write a custom indexing program that runs separately
from Solr, possibly on an entirely different server.  That might be a
lot more efficient than DIH.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

RE: Sql entity processor sortedmapbackedcache out of memory issue

Srinivas Kashyap-2
Hi Shawn/Mikhail Khludnev,

I was going through Jira  https://issues.apache.org/jira/browse/SOLR-4799 and see, I can do my intended activity by specifying zipper.

I tried doing it, however I'm getting error as below:

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.IllegalArgumentException: expect increasing foreign keys for Relation CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:62)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 5 more
Caused by: java.lang.IllegalArgumentException: expect increasing foreign keys for Relation CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782
at org.apache.solr.handler.dataimport.Zipper.supplyNextChild(Zipper.java:70)
at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:126)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)


Below is my dih config:


<entity name="PARENT" pk="PQRS"
                                                query="SELECT PQRS,PARENT_KEY,L,M,N,O FROM DEF order by PARENT_KEY DESC"
                                                >

                                                <field name="L" column="L" />
                                                <field name="M" column="M" />
                                                <field name="N" column="N" />

                                                <entity name="childentity1" pk="PQRS"
                                                                query="SELECT A,B,C,D,E,F,CHILD_KEY,MODIFY_TS FROM ABC ORDER BY CHILD_KEY  DESC"
                                                                processor="SqlEntityProcessor" join="zipper" where="CHILD_KEY= PARENT.PARENT_KEY"
                                                                >

                                                                <field name="A" column="A" />
                                                                <field name="B" column="B" />
                                                </entity>


Thanks and Regards,
Srinivas Kashyap

-----Original Message-----
From: Shawn Heisey <[hidden email]>
Sent: 09 April 2019 01:27 PM
To: [hidden email]
Subject: Re: Sql entity processor sortedmapbackedcache out of memory issue

On 4/8/2019 11:47 PM, Srinivas Kashyap wrote:
> I'm using DIH to index the data and the structure of the DIH is like below for solr core:
>
> <entity>
> 16 child entities
> </entity>
>
> During indexing, since the number of requests being made to database was high(to process one document 17 queries) and was utilizing most of connections of database thereby blocking our web application.

If you have 17 entities, then one document will indeed take 17 queries.
That's the nature of multiple DIH entities.

> To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to reduce the number of requests to database.

When you use SortedMapBackedCache on an entity, you are asking Solr to store the results of the entire query in memory, even if you don't need all of the results.  If the database has a lot of rows, that's going to take a lot of memory.

In your excerpt from the config, your inner entity doesn't have a WHERE clause.  Which means that it's going to retrieve all of the rows of the ABC table for *EVERY* single entry in the DEF table.  That's going to be exceptionally slow.  Normally the SQL query on inner entities will have some kind of WHERE clause that limits the results to rows that match the entry from the outer entity.

You may need to write a custom indexing program that runs separately from Solr, possibly on an entirely different server.  That might be a lot more efficient than DIH.

Thanks,
Shawn
________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way.
No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.
Reply | Threaded
Open this post in threaded view
|

Re: Sql entity processor sortedmapbackedcache out of memory issue

Nitin Kumar
Is caching works with other entity processor like SolrEntityprocessor?

On Fri 12 Apr, 2019, 3:10 PM Srinivas Kashyap, <[hidden email]>
wrote:

> Hi Shawn/Mikhail Khludnev,
>
> I was going through Jira  https://issues.apache.org/jira/browse/SOLR-4799
> and see, I can do my intended activity by specifying zipper.
>
> I tried doing it, however I'm getting error as below:
>
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.IllegalArgumentException: expect increasing foreign keys for
> Relation CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782
> at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:62)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
> ... 5 more
> Caused by: java.lang.IllegalArgumentException: expect increasing foreign
> keys for Relation CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782
> at
> org.apache.solr.handler.dataimport.Zipper.supplyNextChild(Zipper.java:70)
> at
> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:126)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
>
>
> Below is my dih config:
>
>
> <entity name="PARENT" pk="PQRS"
>                                                 query="SELECT
> PQRS,PARENT_KEY,L,M,N,O FROM DEF order by PARENT_KEY DESC"
>                                                 >
>
>                                                 <field name="L" column="L"
> />
>                                                 <field name="M" column="M"
> />
>                                                 <field name="N" column="N"
> />
>
>                                                 <entity
> name="childentity1" pk="PQRS"
>
> query="SELECT A,B,C,D,E,F,CHILD_KEY,MODIFY_TS FROM ABC ORDER BY CHILD_KEY
> DESC"
>
> processor="SqlEntityProcessor" join="zipper" where="CHILD_KEY=
> PARENT.PARENT_KEY"
>                                                                 >
>
>                                                                 <field
> name="A" column="A" />
>                                                                 <field
> name="B" column="B" />
>                                                 </entity>
>
>
> Thanks and Regards,
> Srinivas Kashyap
>
> -----Original Message-----
> From: Shawn Heisey <[hidden email]>
> Sent: 09 April 2019 01:27 PM
> To: [hidden email]
> Subject: Re: Sql entity processor sortedmapbackedcache out of memory issue
>
> On 4/8/2019 11:47 PM, Srinivas Kashyap wrote:
> > I'm using DIH to index the data and the structure of the DIH is like
> below for solr core:
> >
> > <entity>
> > 16 child entities
> > </entity>
> >
> > During indexing, since the number of requests being made to database was
> high(to process one document 17 queries) and was utilizing most of
> connections of database thereby blocking our web application.
>
> If you have 17 entities, then one document will indeed take 17 queries.
> That's the nature of multiple DIH entities.
>
> > To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl
> parameter to reduce the number of requests to database.
>
> When you use SortedMapBackedCache on an entity, you are asking Solr to
> store the results of the entire query in memory, even if you don't need all
> of the results.  If the database has a lot of rows, that's going to take a
> lot of memory.
>
> In your excerpt from the config, your inner entity doesn't have a WHERE
> clause.  Which means that it's going to retrieve all of the rows of the ABC
> table for *EVERY* single entry in the DEF table.  That's going to be
> exceptionally slow.  Normally the SQL query on inner entities will have
> some kind of WHERE clause that limits the results to rows that match the
> entry from the outer entity.
>
> You may need to write a custom indexing program that runs separately from
> Solr, possibly on an entirely different server.  That might be a lot more
> efficient than DIH.
>
> Thanks,
> Shawn
> ________________________________
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
>
Reply | Threaded
Open this post in threaded view
|

RE: Sql entity processor sortedmapbackedcache out of memory issue

Srinivas Kashyap-2
In reply to this post by Srinivas Kashyap-2
Hi Shawn, Mikhail

Any suggestions/pointers for using zipper algorithm. I'm facing below error.

Thanks and Regards,
Srinivas Kashyap
******************************************************************************************

From: Srinivas Kashyap <[hidden email]>
Sent: 12 April 2019 03:10 PM
To: [hidden email]
Subject: RE: Sql entity processor sortedmapbackedcache out of memory issue

Hi Shawn/Mikhail Khludnev,

I was going through Jira  https://issues.apache.org/jira/browse/SOLR-4799 and see, I can do my intended activity by specifying zipper.

I tried doing it, however I'm getting error as below:

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.IllegalArgumentException: expect increasing foreign keys for Relation CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:62)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 5 more
Caused by: java.lang.IllegalArgumentException: expect increasing foreign keys for Relation CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782 at org.apache.solr.handler.dataimport.Zipper.supplyNextChild(Zipper.java:70)
at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:126)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)


Below is my dih config:


<entity name="PARENT" pk="PQRS"
                                                query="SELECT PQRS,PARENT_KEY,L,M,N,O FROM DEF order by PARENT_KEY DESC"
                                                >

                                                <field name="L" column="L" />
                                                <field name="M" column="M" />
                                                <field name="N" column="N" />

                                                <entity name="childentity1" pk="PQRS"
                                                                query="SELECT A,B,C,D,E,F,CHILD_KEY,MODIFY_TS FROM ABC ORDER BY CHILD_KEY  DESC"
                                                                processor="SqlEntityProcessor" join="zipper" where="CHILD_KEY= PARENT.PARENT_KEY"
                                                                >

                                                                <field name="A" column="A" />
                                                                <field name="B" column="B" />
                                                </entity>


Thanks and Regards,
Srinivas Kashyap

-----Original Message-----
From: Shawn Heisey <[hidden email]>
Sent: 09 April 2019 01:27 PM
To: [hidden email]
Subject: Re: Sql entity processor sortedmapbackedcache out of memory issue

On 4/8/2019 11:47 PM, Srinivas Kashyap wrote:
> I'm using DIH to index the data and the structure of the DIH is like below for solr core:
>
> <entity>
> 16 child entities
> </entity>
>
> During indexing, since the number of requests being made to database was high(to process one document 17 queries) and was utilizing most of connections of database thereby blocking our web application.

If you have 17 entities, then one document will indeed take 17 queries.
That's the nature of multiple DIH entities.

> To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to reduce the number of requests to database.

When you use SortedMapBackedCache on an entity, you are asking Solr to store the results of the entire query in memory, even if you don't need all of the results.  If the database has a lot of rows, that's going to take a lot of memory.

In your excerpt from the config, your inner entity doesn't have a WHERE clause.  Which means that it's going to retrieve all of the rows of the ABC table for *EVERY* single entry in the DEF table.  That's going to be exceptionally slow.  Normally the SQL query on inner entities will have some kind of WHERE clause that limits the results to rows that match the entry from the outer entity.

You may need to write a custom indexing program that runs separately from Solr, possibly on an entirely different server.  That might be a lot more efficient than DIH.

Thanks,
Shawn
________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way.
No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.