[jira] [Created] (SOLR-3684) Frequently full gc while do pressure index

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
Raintung Li created SOLR-3684:
---------------------------------

             Summary: Frequently full gc while do pressure index
                 Key: SOLR-3684
                 URL: https://issues.apache.org/jira/browse/SOLR-3684
             Project: Solr
          Issue Type: Improvement
          Components: multicore
    Affects Versions: 4.0-ALPHA
         Environment: System: Linux
Java process: 4G memory
Jetty: 1000 threads
Index: 20 field
Core: 5

            Reporter: Raintung Li
            Priority: Critical


Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.

After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.

In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.

The worst case: Total memory = live threads*cores*fields*32KB

In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.

Suggestion:
Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.

Total memory will be = live threads*cores*(different fields types)*32KB

The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
private class SolrIndexAnalyzer extends AnalyzerWrapper {
         
        private class SolrFieldReuseStrategy extends ReuseStrategy {

              /**
               * {@inheritDoc}
               */
              @SuppressWarnings("unchecked")
              public TokenStreamComponents getReusableComponents(String fieldName) {
                Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
                return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
              }

              /**
               * {@inheritDoc}
               */
              @SuppressWarnings("unchecked")
              public void setReusableComponents(String fieldName, TokenStreamComponents components) {
                Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
                if (componentsPerField == null) {
                  componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
                  setStoredValue(componentsPerField);
                }
                componentsPerField.put(analyzers.get(fieldName), components);
              }
        }
       
    protected final static HashMap<String, Analyzer> analyzers;
    /**
     * Implementation of {@link ReuseStrategy} that reuses components per-field by
     * maintaining a Map of TokenStreamComponent per field name.
     */
   
    SolrIndexAnalyzer() {
      super(new solrFieldReuseStrategy());
      analyzers = analyzerCache();
    }

    protected HashMap<String, Analyzer> analyzerCache() {
      HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
      for (SchemaField f : getFields().values()) {
        Analyzer analyzer = f.getType().getAnalyzer();
        cache.put(f.getName(), analyzer);
      }
      return cache;
    }

    @Override
    protected Analyzer getWrappedAnalyzer(String fieldName) {
      Analyzer analyzer = analyzers.get(fieldName);
      return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
    }

    @Override
    protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
      return components;
    }
  }

  private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
    @Override
    protected HashMap<String, Analyzer> analyzerCache() {
      HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
       for (SchemaField f : getFields().values()) {
        Analyzer analyzer = f.getType().getQueryAnalyzer();
        cache.put(f.getName(), analyzer);
      }
      return cache;
    }

    @Override
    protected Analyzer getWrappedAnalyzer(String fieldName) {
      Analyzer analyzer = analyzers.get(fieldName);
      return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
    }
  }


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raintung Li updated SOLR-3684:
------------------------------

    Attachment: patch.txt
   

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>         Attachments: patch.txt
>
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raintung Li updated SOLR-3684:
------------------------------

         Fix Version/s: 4.0
    Remaining Estimate: 168h
     Original Estimate: 168h
   

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428863#comment-13428863 ]

Robert Muir commented on SOLR-3684:
-----------------------------------


the patch would let Solr reuse analyzers across field types, however:
* the problem is you are indexing with 1000 threads. why are you using so many?
  This is the root cause here, in general you can plan on having performance problems.
* if someone has a fieldType "myType" and they use it for "field1" and "field2",
  and "myType" is itself a per-field analyzer (it does something different for
  these two fields), then this code will sometimes analyze the fields the wrong way.
               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429015#comment-13429015 ]

Raintung Li commented on SOLR-3684:
-----------------------------------

For 1, I want to test index's the throughput in solr cloud, start 1000 threads in the Jmeter, solr cloud server Jetty max threads is 10000.
Usually pressure test throughput achieve the max, then keep or down smoothly, the average last status is stable.  In this case, the JVM look like the hungup, always do full gc, the cache for StandardTokenizer cost too many memory and thread still alive that cause the cache can't release, new request still come, the throughput become very bad.

For 2, how to create the per-field analyzer? Is it the same analyzer? analyzer.tokenStream had been declare final, how to create the tokenStream the different fields? For one thread use the same tokenstream it is safe, TokenStreamComponents it is thread's cache. Could you give more information?


               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429027#comment-13429027 ]

Mikhail Khludnev commented on SOLR-3684:
----------------------------------------

Hello,

Q1 gives one more usage for SOLR-3585. It uses dedicated thread pool with limited capacity to proceed updates. So, the core challenge will be solved.

Raintung,
updating with the storm of small messages is not common for search engines world. Usual way is collecting them in bulks and index by modest number of threads. Sooner or later indexing hits io limit, therefore there is no profit to utilize CPU's by huge amount of indexing threads.  

               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429038#comment-13429038 ]

Raintung Li commented on SOLR-3684:
-----------------------------------

Hi Mikhail,

It isn't really storm that only 1000 client send the message, and we have three solr index servers, and all servers have the same issues.

My suggestion just want to reduce wasteful memory, although memory is cheap now. To improve the performance to avoid io limit, we save into the memory, but also need calculate the memory usage even if JVM help us to manage the memory.

BTW, the default Jetty thread config is 10000 in the solr, in this cause the every server's alive threads are more than 1000.





               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Comment Edited] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429038#comment-13429038 ]

Raintung Li edited comment on SOLR-3684 at 8/6/12 9:51 AM:
-----------------------------------------------------------

Hi Mikhail,

It isn't really storm that only 1000 client send the message, and we have three solr index servers, and all servers have the same issues.

My suggestion just want to reduce wasteful memory, although memory is cheap now. To improve the performance to avoid io limit, we save into the memory, but also need calculate the memory usage even if JVM help us to manage the memory.

BTW, the default Jetty thread config is 10000 in the solr, in this case the every server's alive threads are more than 1000.





               
      was (Author: raintung.li):
    Hi Mikhail,

It isn't really storm that only 1000 client send the message, and we have three solr index servers, and all servers have the same issues.

My suggestion just want to reduce wasteful memory, although memory is cheap now. To improve the performance to avoid io limit, we save into the memory, but also need calculate the memory usage even if JVM help us to manage the memory.

BTW, the default Jetty thread config is 10000 in the solr, in this cause the every server's alive threads are more than 1000.





                 

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429173#comment-13429173 ]

Robert Muir commented on SOLR-3684:
-----------------------------------

{quote}
BTW, the default Jetty thread config is 10000 in the solr,
{quote}

Can we address this default thread config with a patch? This doesn't seem good, I guess if someone doesn't
fix this I can easily DoS Solrs into eating up all their RAM until rebooted. Something like 100 seems just
fine for QueuedThreadPool, so it will block in such cases (and probably just end out being faster overall).

{quote}
For 2, how to create the per-field analyzer? Is it the same analyzer? analyzer.tokenStream had been declare final, how to create the tokenStream the different fields? For one thread use the same tokenstream it is safe, TokenStreamComponents it is thread's cache. Could you give more information?
{quote}

Well basically your patch should be a nice improvement about 99.9% of the time. There is a (maybe only theoretical)
case where someone has a lucene Analyzer MyAnalyzer configured as:
{quote}
<fieldType name="text_custom" class="solr.TextField">
  <analyzer class="com.mypackage.MyAnalyzer"/>
</fieldType>
...
<field name="foo" type="text_custom" .../>
<field name="bar" type="text_custom" .../>
...
{quote}

If MyAnalyzer has different behavior for "foo" versus "bar", then reuse-by-field-type will be incorrect. I'll think
about a workaround, maybe nobody is even doing this or depends on this. But I just don't know if the same thing
could happen for custom fieldtypes or whatever. Its just the kind of thing that could be a sneaky bug in the future.

But I agree with the patch! I'll see if we can address it somehow.

Separately I think we should also open an issue to reduce these jflex buffer sizes. char[16k] seems like serious
overkill, the other tokenizers in lucene use char[4k].

               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429287#comment-13429287 ]

Robert Muir commented on SOLR-3684:
-----------------------------------

FYI: I lowered the jflex buffer sizes from 32kb to 8kb in LUCENE-4291.

So I think we should still:
# Address this default jetty threadpool size of max=10,000. This is the real issue.
# See if we can deal with the crazy corner case so we can impl your patch (reuse by fieldtype), which I think is a good separate improvement.

               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429682#comment-13429682 ]

Yonik Seeley commented on SOLR-3684:
------------------------------------

bq. Address this default jetty threadpool size of max=10,000. This is the real issue.

I had thought that jetty reused a small number of threads - O(n_concurrent_connections), regardless of what the max number of threads were?
               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429688#comment-13429688 ]

Robert Muir commented on SOLR-3684:
-----------------------------------

It does: I think the reuse is not the problem but the max?

By default i think it keeps min threads always (default 10), but our max of 10,000 allows it to temporarily
spike huge (versus blocking). from looking at the jetty code, by default these will die off after 60s, which is fine,
but we enrolled so many entries into e.g. Analyzer's or SegmentReader's CloseableThreadlocals, that when they die off
and the CTL does a purge, its just a ton of garbage.

Really there isnt much benefit here in using so many threads at indexing time (dwpt's max threads is 8, unless changed
in IndexWriterConfig, and this would have other bad side effects). At query time I think something closer to jetty's
default of 254 would actually be better too.

But i looked at the history of this file, and it seems the reason it was set to 10,000 was to prevent a deadlock (SOLR-683) ?
Is there a better solution to this now so that we can reduce this max?

Separately I've been fixing the analyzers that do hog ram because machines are getting more cores, so I think its
worth it. But I think it would be nice if we can fix this max=10,000
               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429985#comment-13429985 ]

Eks Dev commented on SOLR-3684:
-------------------------------

We did it a long time ago on tomcat, as we use particularly expensive analyzers, so even for searching optimum is around Noo cores. Actually, that was the only big problem with solr we had.  
 
Actually, anything that keeps insane thread churn low helps. Not only max number of threads, but TTL time for idle threads should be also somehow increased. The longer threads live, the better. Solr is completely safe due to core-reloading and smart Index management, no point in renewing threads.  

If one needs to queue requests, that is just another problem,  but for this there no need to up max worker threads to more than number of cores plus some smallish constant

What we would like to achieve is to keep separate thread pools for searching, indexing and "the rest"... but we never managed to figure out how to do it. even benign, /ping, /status.... whatever are increasing thread churn... If we were able to configure separate pools , we could keep small number of long-living threads for searching, even smaller number for indexing and one "who cares" pool for the rest. It is somehow possible on tomcat, if someone knows how to do it, please share.
               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430293#comment-13430293 ]

Yonik Seeley commented on SOLR-3684:
------------------------------------

bq. What we would like to achieve is to keep separate thread pools for searching, indexing and "the rest".

Yeah, exactly.  I'd love to be able to assign different thread pools to different URLs, but I don't know if that's doable in Jetty or not.
               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430294#comment-13430294 ]

Robert Muir commented on SOLR-3684:
-----------------------------------

What about http://docs.codehaus.org/display/JETTY/Quality+of+Service+Filter ?
 
               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430839#comment-13430839 ]

Raintung Li commented on SOLR-3684:
-----------------------------------

I just check all solr/lucene analyzer, the entrance is method createComponents, for different field name use different TokenStreamComponent that cause field type's cache is avalid. Is this way?
 protected TokenStreamComponents createComponents(String fieldName,
      Reader reader) {
...
}
the parameter fieldName doesn't be used in solr/lucene self analyzer, maybe we can remove this parameter direct in Analyzer.java. We can identify this that one field type only match one analyzer.

The other simple way is define that only solr/lucene package path use field type's cache, the other custom's analyzer use field name's cache.

For different path thread pool issue, maybe can user different port to handle, we do it in the tomcat.
               

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raintung Li updated SOLR-3684:
------------------------------

    Attachment:     (was: patch.txt)
   

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raintung Li updated SOLR-3684:
------------------------------

    Attachment: patch.txt
   

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raintung Li updated SOLR-3684:
------------------------------

    Attachment:     (was: patch.txt)
   

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (SOLR-3684) Frequently full gc while do pressure index

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raintung Li updated SOLR-3684:
------------------------------

    Attachment: patch.txt
   

> Frequently full gc while do pressure index
> ------------------------------------------
>
>                 Key: SOLR-3684
>                 URL: https://issues.apache.org/jira/browse/SOLR-3684
>             Project: Solr
>          Issue Type: Improvement
>          Components: multicore
>    Affects Versions: 4.0-ALPHA
>         Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads
> Index: 20 field
> Core: 5
>            Reporter: Raintung Li
>            Priority: Critical
>              Labels: garbage, performance
>             Fix For: 4.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC.
> Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer,  and standardtokenizer will occur 32KB memory because of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only analyses by one thread.  For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java:
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
>  
> private class SolrFieldReuseStrategy extends ReuseStrategy {
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public TokenStreamComponents getReusableComponents(String fieldName) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null;
>      }
>      /**
>       * {@inheritDoc}
>       */
>      @SuppressWarnings("unchecked")
>      public void setReusableComponents(String fieldName, TokenStreamComponents components) {
>        Map<Analyzer, TokenStreamComponents> componentsPerField = (Map<Analyzer, TokenStreamComponents>) getStoredValue();
>        if (componentsPerField == null) {
>          componentsPerField = new HashMap<Analyzer, TokenStreamComponents>();
>          setStoredValue(componentsPerField);
>        }
>        componentsPerField.put(analyzers.get(fieldName), components);
>      }
> }
>
>     protected final static HashMap<String, Analyzer> analyzers;
>     /**
>      * Implementation of {@link ReuseStrategy} that reuses components per-field by
>      * maintaining a Map of TokenStreamComponent per field name.
>      */
>    
>     SolrIndexAnalyzer() {
>       super(new solrFieldReuseStrategy());
>       analyzers = analyzerCache();
>     }
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>       for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer();
>     }
>     @Override
>     protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
>       return components;
>     }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
>     @Override
>     protected HashMap<String, Analyzer> analyzerCache() {
>       HashMap<String, Analyzer> cache = new HashMap<String, Analyzer>();
>        for (SchemaField f : getFields().values()) {
>         Analyzer analyzer = f.getType().getQueryAnalyzer();
>         cache.put(f.getName(), analyzer);
>       }
>       return cache;
>     }
>     @Override
>     protected Analyzer getWrappedAnalyzer(String fieldName) {
>       Analyzer analyzer = analyzers.get(fieldName);
>       return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getQueryAnalyzer();
>     }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12