Bad and continously degrading update performance

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Bad and continously degrading update performance

Christian Kolodziej
Hello everbody,

I've a question about the performance and the internal actions of the update process. We've an index containing nearly 200.000 entries (one field contains much content), the schema.xml is the following:

// ...
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
</fieldType>
// ...
<fields>
   <field name="id" type="string" indexed="true" stored="true" required="true" />
   <field name="date" type="date" indexed="true" stored="false" required="true" />
   <field name="headline" type="text" indexed="true" stored="true" required="true" />
   <field name="companyid" type="integer" indexed="true" stored="false" required="true" />
   <field name="companyname" type="text" indexed="true" stored="true" required="true" />
   <field name="text" type="text" indexed="true" stored="true" required="true" />
   <field name="language" type="string" indexed="true" stored="false" required="true" />
</fields>
// ....

Every five minutes there is a cronjob, that updates a small number (between 1 and maybe 20) of records that have been edited. But its speed is not satisfying, the needed time grows continuously and was over 4 minutes before we restarted tomcat. That was very good for the first updates (17 seconds), but soon the time raises again up to 170 and more seconds.

Does anyone have an idea were the problem is? Or is there no problem and the performance is "normal" for our configuration? I hope there are some tricks out there to enhance the performance.

Best regards,
Christian
Reply | Threaded
Open this post in threaded view
|

Re: Bad and continously degrading update performance

Shalin Shekhar Mangar
Are you calling optimize each time you update? Try reducing autoWarmCount on
the caches or turn them off. 5 minutes is an aggressive target but it may be
doable since your updates are less.

On Tue, Sep 9, 2008 at 3:34 PM, Kolodziej Christian <
[hidden email]> wrote:

> Hello everbody,
>
> I've a question about the performance and the internal actions of the
> update process. We've an index containing nearly 200.000 entries (one field
> contains much content), the schema.xml is the following:
>
> // ...
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <!-- in this example, we will only use synonyms at query time
>        <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>        -->
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
> </fieldType>
> // ...
> <fields>
>   <field name="id" type="string" indexed="true" stored="true"
> required="true" />
>   <field name="date" type="date" indexed="true" stored="false"
> required="true" />
>   <field name="headline" type="text" indexed="true" stored="true"
> required="true" />
>   <field name="companyid" type="integer" indexed="true" stored="false"
> required="true" />
>   <field name="companyname" type="text" indexed="true" stored="true"
> required="true" />
>   <field name="text" type="text" indexed="true" stored="true"
> required="true" />
>   <field name="language" type="string" indexed="true" stored="false"
> required="true" />
> </fields>
> // ....
>
> Every five minutes there is a cronjob, that updates a small number (between
> 1 and maybe 20) of records that have been edited. But its speed is not
> satisfying, the needed time grows continuously and was over 4 minutes before
> we restarted tomcat. That was very good for the first updates (17 seconds),
> but soon the time raises again up to 170 and more seconds.
>
> Does anyone have an idea were the problem is? Or is there no problem and
> the performance is "normal" for our configuration? I hope there are some
> tricks out there to enhance the performance.
>
> Best regards,
> Christian
>



--
Regards,
Shalin Shekhar Mangar.