[jira] Created: (SOLR-1623) Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-1623) Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names

Prajeeth Emanuel (Jira)
Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names
--------------------------------------------------------------------------------------------------------------

                 Key: SOLR-1623
                 URL: https://issues.apache.org/jira/browse/SOLR-1623
             Project: Solr
          Issue Type: Bug
          Components: update
    Affects Versions: 1.4, 1.3
         Environment: Tomcat Version         JVM Version                      JVM Vendor                    OS Name OS Version        OS Architecture
Apache Tomcat/6.0   snapshot 1.6.0_13-b03 Sun Microsystems Inc. Linux         2.6.18-164.el5  amd64

and/or

Tomcat Version                JVM Version         JVM Vendor                        OS Name               OS Version    OS Architecture
Apache Tomcat/6.0.18   1.6.0_12-b04        Sun Microsystems Inc.     Windows 2003     5.2                   amd64

            Reporter: Laurent Chavet
            Priority: Critical


With the following fields in schema.xml:

 <fields>
   <field name="id" type="sint" indexed="true" stored="true" required="true" />
    <dynamicField name="weight_*"  type="sint"    indexed="true"  stored="true"/>
</fields>


Run the following code:


import java.util.ArrayList;
import java.util.List;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.common.SolrInputDocument;

    public static void main(String[] args) throws Exception {
        SolrServer server;
        try {
            server = new CommonsHttpSolrServer(args[0]);
        } catch (Exception e) {
            System.err.println("can't creater server using: " + args[0] + "  " + e.getMessage());
            throw e;
        }
        for (int i = 0; i < 1000; i++) {
            List<SolrInputDocument> batchedDocs = new ArrayList<SolrInputDocument>();
            for (int j = 0; j < 1000; j++) {
                SolrInputDocument doc = new SolrInputDocument();
                doc.addField("id", i * 1000 + j);
                // hangs after 30 to 50 batches
                doc.addField("weight_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
                // hangs after about 200 batches
                //doc.addField("weight_" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
                batchedDocs.add(doc);
            }
            try {
                server.add(batchedDocs, true);
                System.err.println("Done with batch=" + i);
                // server.commit(); //doesn't change anything
            } catch (Exception e) {
                System.err.println("batchId=" + i + " bad batch: " + e.getMessage());
                throw e;
            }
        }
    }

And soon the client (sometime throws) and solr will freeze. sometime you can see: java.lang.OutOfMemoryError: PermGen space in the server logs


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1623) Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names

Prajeeth Emanuel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786161#action_12786161 ]

Yonik Seeley commented on SOLR-1623:
------------------------------------

This is most likely due to interning of field names.  If you really need that many field names, the only option right now is to increase the size of the perm gen.

> Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1623
>                 URL: https://issues.apache.org/jira/browse/SOLR-1623
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3, 1.4
>         Environment: Tomcat Version         JVM Version                      JVM Vendor                    OS Name OS Version        OS Architecture
> Apache Tomcat/6.0   snapshot 1.6.0_13-b03 Sun Microsystems Inc. Linux         2.6.18-164.el5  amd64
> and/or
> Tomcat Version                JVM Version         JVM Vendor                        OS Name               OS Version    OS Architecture
> Apache Tomcat/6.0.18   1.6.0_12-b04        Sun Microsystems Inc.     Windows 2003     5.2                   amd64
>            Reporter: Laurent Chavet
>            Priority: Critical
>
> With the following fields in schema.xml:
>  <fields>
>    <field name="id" type="sint" indexed="true" stored="true" required="true" />
>     <dynamicField name="weight_*"  type="sint"    indexed="true"  stored="true"/>
> </fields>
> Run the following code:
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.solr.client.solrj.SolrServer;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> import org.apache.solr.common.SolrInputDocument;
>     public static void main(String[] args) throws Exception {
>         SolrServer server;
>         try {
>             server = new CommonsHttpSolrServer(args[0]);
>         } catch (Exception e) {
>             System.err.println("can't creater server using: " + args[0] + "  " + e.getMessage());
>             throw e;
>         }
>         for (int i = 0; i < 1000; i++) {
>             List<SolrInputDocument> batchedDocs = new ArrayList<SolrInputDocument>();
>             for (int j = 0; j < 1000; j++) {
>                 SolrInputDocument doc = new SolrInputDocument();
>                 doc.addField("id", i * 1000 + j);
>                 // hangs after 30 to 50 batches
>                 doc.addField("weight_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
>                 // hangs after about 200 batches
>                 //doc.addField("weight_" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
>                 batchedDocs.add(doc);
>             }
>             try {
>                 server.add(batchedDocs, true);
>                 System.err.println("Done with batch=" + i);
>                 // server.commit(); //doesn't change anything
>             } catch (Exception e) {
>                 System.err.println("batchId=" + i + " bad batch: " + e.getMessage());
>                 throw e;
>             }
>         }
>     }
> And soon the client (sometime throws) and solr will freeze. sometime you can see: java.lang.OutOfMemoryError: PermGen space in the server logs

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1623) Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names

Prajeeth Emanuel (Jira)
In reply to this post by Prajeeth Emanuel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786167#action_12786167 ]

Mark Miller commented on SOLR-1623:
-----------------------------------

Whats odd is that he has it marked as affects 1.4 as well - but that doesn't intern to perm gen anymore? Did you really test with 1.4? Or am I missing something?

You can also turn on gc for the perm gen space - not a complete solution, but it can help under the right circumstances (likely in combination with a larger perm gen space)).

> Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1623
>                 URL: https://issues.apache.org/jira/browse/SOLR-1623
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3, 1.4
>         Environment: Tomcat Version         JVM Version                      JVM Vendor                    OS Name OS Version        OS Architecture
> Apache Tomcat/6.0   snapshot 1.6.0_13-b03 Sun Microsystems Inc. Linux         2.6.18-164.el5  amd64
> and/or
> Tomcat Version                JVM Version         JVM Vendor                        OS Name               OS Version    OS Architecture
> Apache Tomcat/6.0.18   1.6.0_12-b04        Sun Microsystems Inc.     Windows 2003     5.2                   amd64
>            Reporter: Laurent Chavet
>            Priority: Critical
>
> With the following fields in schema.xml:
>  <fields>
>    <field name="id" type="sint" indexed="true" stored="true" required="true" />
>     <dynamicField name="weight_*"  type="sint"    indexed="true"  stored="true"/>
> </fields>
> Run the following code:
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.solr.client.solrj.SolrServer;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> import org.apache.solr.common.SolrInputDocument;
>     public static void main(String[] args) throws Exception {
>         SolrServer server;
>         try {
>             server = new CommonsHttpSolrServer(args[0]);
>         } catch (Exception e) {
>             System.err.println("can't creater server using: " + args[0] + "  " + e.getMessage());
>             throw e;
>         }
>         for (int i = 0; i < 1000; i++) {
>             List<SolrInputDocument> batchedDocs = new ArrayList<SolrInputDocument>();
>             for (int j = 0; j < 1000; j++) {
>                 SolrInputDocument doc = new SolrInputDocument();
>                 doc.addField("id", i * 1000 + j);
>                 // hangs after 30 to 50 batches
>                 doc.addField("weight_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
>                 // hangs after about 200 batches
>                 //doc.addField("weight_" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
>                 batchedDocs.add(doc);
>             }
>             try {
>                 server.add(batchedDocs, true);
>                 System.err.println("Done with batch=" + i);
>                 // server.commit(); //doesn't change anything
>             } catch (Exception e) {
>                 System.err.println("batchId=" + i + " bad batch: " + e.getMessage());
>                 throw e;
>             }
>         }
>     }
> And soon the client (sometime throws) and solr will freeze. sometime you can see: java.lang.OutOfMemoryError: PermGen space in the server logs

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1623) Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names

Prajeeth Emanuel (Jira)
In reply to this post by Prajeeth Emanuel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786175#action_12786175 ]

Yonik Seeley commented on SOLR-1623:
------------------------------------

bq. Whats odd is that he has it marked as affects 1.4 as well - but that doesn't intern to perm gen anymore?

The default StringHelper.intern() from Lucene is just a cache - String.intern() is still called.

> Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1623
>                 URL: https://issues.apache.org/jira/browse/SOLR-1623
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3, 1.4
>         Environment: Tomcat Version         JVM Version                      JVM Vendor                    OS Name OS Version        OS Architecture
> Apache Tomcat/6.0   snapshot 1.6.0_13-b03 Sun Microsystems Inc. Linux         2.6.18-164.el5  amd64
> and/or
> Tomcat Version                JVM Version         JVM Vendor                        OS Name               OS Version    OS Architecture
> Apache Tomcat/6.0.18   1.6.0_12-b04        Sun Microsystems Inc.     Windows 2003     5.2                   amd64
>            Reporter: Laurent Chavet
>            Priority: Critical
>
> With the following fields in schema.xml:
>  <fields>
>    <field name="id" type="sint" indexed="true" stored="true" required="true" />
>     <dynamicField name="weight_*"  type="sint"    indexed="true"  stored="true"/>
> </fields>
> Run the following code:
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.solr.client.solrj.SolrServer;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> import org.apache.solr.common.SolrInputDocument;
>     public static void main(String[] args) throws Exception {
>         SolrServer server;
>         try {
>             server = new CommonsHttpSolrServer(args[0]);
>         } catch (Exception e) {
>             System.err.println("can't creater server using: " + args[0] + "  " + e.getMessage());
>             throw e;
>         }
>         for (int i = 0; i < 1000; i++) {
>             List<SolrInputDocument> batchedDocs = new ArrayList<SolrInputDocument>();
>             for (int j = 0; j < 1000; j++) {
>                 SolrInputDocument doc = new SolrInputDocument();
>                 doc.addField("id", i * 1000 + j);
>                 // hangs after 30 to 50 batches
>                 doc.addField("weight_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
>                 // hangs after about 200 batches
>                 //doc.addField("weight_" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
>                 batchedDocs.add(doc);
>             }
>             try {
>                 server.add(batchedDocs, true);
>                 System.err.println("Done with batch=" + i);
>                 // server.commit(); //doesn't change anything
>             } catch (Exception e) {
>                 System.err.println("batchId=" + i + " bad batch: " + e.getMessage());
>                 throw e;
>             }
>         }
>     }
> And soon the client (sometime throws) and solr will freeze. sometime you can see: java.lang.OutOfMemoryError: PermGen space in the server logs

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1623) Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names

Prajeeth Emanuel (Jira)
In reply to this post by Prajeeth Emanuel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786185#action_12786185 ]

Laurent Chavet commented on SOLR-1623:
--------------------------------------

Yes this definitely repros in 1.4.

Unfortunately I think I need a lot of fields; here is what I am trying to do:

I want to store news articles and extract many topics for each story with a score for each topic for each story.

So for example a story migh have a topic of Crime with a score of 20.

So what I am doing now is store:

Field:Topic    Value:Crime      indexed="true" stored="true"            (need to searched and retrieved)
Field:Weight_Topic_Crime  Value:20  indexed="true" stored="true"   (needs to be sorted and retrieved)

Because there can be a lot of different value for the field topic; with this schema  we end up with a lot of fields starting with weight.

Any suggestion on how to achieve the same result in a different way?

Thanks,

Laurent

> Solr hangs (often throwing java.lang.OutOfMemoryError: PermGen space) when indexing many different field names
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1623
>                 URL: https://issues.apache.org/jira/browse/SOLR-1623
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.3, 1.4
>         Environment: Tomcat Version         JVM Version                      JVM Vendor                    OS Name OS Version        OS Architecture
> Apache Tomcat/6.0   snapshot 1.6.0_13-b03 Sun Microsystems Inc. Linux         2.6.18-164.el5  amd64
> and/or
> Tomcat Version                JVM Version         JVM Vendor                        OS Name               OS Version    OS Architecture
> Apache Tomcat/6.0.18   1.6.0_12-b04        Sun Microsystems Inc.     Windows 2003     5.2                   amd64
>            Reporter: Laurent Chavet
>            Priority: Critical
>
> With the following fields in schema.xml:
>  <fields>
>    <field name="id" type="sint" indexed="true" stored="true" required="true" />
>     <dynamicField name="weight_*"  type="sint"    indexed="true"  stored="true"/>
> </fields>
> Run the following code:
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.solr.client.solrj.SolrServer;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> import org.apache.solr.common.SolrInputDocument;
>     public static void main(String[] args) throws Exception {
>         SolrServer server;
>         try {
>             server = new CommonsHttpSolrServer(args[0]);
>         } catch (Exception e) {
>             System.err.println("can't creater server using: " + args[0] + "  " + e.getMessage());
>             throw e;
>         }
>         for (int i = 0; i < 1000; i++) {
>             List<SolrInputDocument> batchedDocs = new ArrayList<SolrInputDocument>();
>             for (int j = 0; j < 1000; j++) {
>                 SolrInputDocument doc = new SolrInputDocument();
>                 doc.addField("id", i * 1000 + j);
>                 // hangs after 30 to 50 batches
>                 doc.addField("weight_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
>                 // hangs after about 200 batches
>                 //doc.addField("weight_" + Integer.toString(i) + "_" + Integer.toString(j), i * 1000 + j);
>                 batchedDocs.add(doc);
>             }
>             try {
>                 server.add(batchedDocs, true);
>                 System.err.println("Done with batch=" + i);
>                 // server.commit(); //doesn't change anything
>             } catch (Exception e) {
>                 System.err.println("batchId=" + i + " bad batch: " + e.getMessage());
>                 throw e;
>             }
>         }
>     }
> And soon the client (sometime throws) and solr will freeze. sometime you can see: java.lang.OutOfMemoryError: PermGen space in the server logs

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.