Search Performance and omitNorms

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Search Performance and omitNorms

Odysci
I'm using solr-8.3.1 on a solrcloud set up with 2 solr nodes and 2 ZK nodes.
I was experiencing very slow search-with-highlighting on a index that had
'omitNorms="true"' on all fields.
At the suggestion of a stackoverflow post, I changed all fields to be
'omitNorms="false"' and the search-with-highlight time came down to about
1/10th of what it was!!!

This was a relatively small index and I had no issues with memory increase.
Now my question is whether I should expect the same speed up on regular
search calls, or search with only filters (no query)?
This would be on a different, much larger index - and I do want to incur
the memory increase unless the search is significantly faster.
Does anyone have any experience in comparing search speed using "omitNorms"
true or false in regular search (non-highlight)?
Thanks!

Reinaldo
Reply | Threaded
Open this post in threaded view
|

Re: Search Performance and omitNorms

Erick Erickson
I suspect this is spurious. Norms are just an encoding
of the length of a field, offhand I have no clue how having
them (or not) would affect highlighting at all.

Term _vectors_ OTOH could have a major impact. If
FastVectorHighlighter is not used, the highlighter has
to re-analyze the text in order to highlight, and if you’re
highlighting in large text fields that can be very expensive.

Norms, aren’t relevant there….

So let’s see the full highlighter configuration you have, along
with the field definition for the field you’re highlighting on.

Best,
Erick

> On Dec 3, 2019, at 4:27 PM, Odysci <[hidden email]> wrote:
>
> I'm using solr-8.3.1 on a solrcloud set up with 2 solr nodes and 2 ZK nodes.
> I was experiencing very slow search-with-highlighting on a index that had
> 'omitNorms="true"' on all fields.
> At the suggestion of a stackoverflow post, I changed all fields to be
> 'omitNorms="false"' and the search-with-highlight time came down to about
> 1/10th of what it was!!!
>
> This was a relatively small index and I had no issues with memory increase.
> Now my question is whether I should expect the same speed up on regular
> search calls, or search with only filters (no query)?
> This would be on a different, much larger index - and I do want to incur
> the memory increase unless the search is significantly faster.
> Does anyone have any experience in comparing search speed using "omitNorms"
> true or false in regular search (non-highlight)?
> Thanks!
>
> Reinaldo

Reply | Threaded
Open this post in threaded view
|

Re: Search Performance and omitNorms

Odysci
Hi Erick,
thanks for the reply.
Just to follow up, I'm using "unified" highlighter (fastVector does not
work for my purposes). I search and highlight on a multivalued string
string field which contains small strings (usually less than 200 chars).
This multivalued field is subject to various processors (tokenizer, word
delimiter, stemming), and all termVectors, termPositions, termOffsets are
"true".
This is what I'm using:

------------------ schema ------------------
   <fieldType name="documentSearchP"
class="solr.TextField" positionIncrementGap="100" omitNorms="false">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms_p.txt"
                ignoreCase="true" expand="false" />
            <filter class="solr.FlattenGraphFilterFactory" />
            <filter class="solr.ASCIIFoldingFilterFactory" />
            <filter class="solr.WordDelimiterGraphFilterFactory"
            splitOnCaseChange="0" splitOnNumerics="0"
stemEnglishPossessive="0"
            generateWordParts="1" generateNumberParts="1" catenateWords="0"
            catenateNumbers="0" catenateAll="1" preserveOriginal="1"/>
            <filter class="solr.FlattenGraphFilterFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.KeywordMarkerFilterFactory"
                protected="protwords.txt"/>
            <filter class="solr.PortugueseLightStemFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms_p.txt"
                ignoreCase="true" expand="false" />
            <filter class="solr.ASCIIFoldingFilterFactory" />
            <filter class="solr.WordDelimiterGraphFilterFactory"
            splitOnCaseChange="0" splitOnNumerics="0"
stemEnglishPossessive="0"
            generateWordParts="1" generateNumberParts="1" catenateWords="0"
            catenateNumbers="0" catenateAll="1" preserveOriginal="1"/>
            <filter class="solr.LowerCaseFilterFactory" />
            <filter
class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
            <filter class="solr.PortugueseLightStemFilterFactory" />
        </analyzer>
    </fieldType>

    <dynamicField name="*_msearchp" type="documentSearchP" indexed="true"
stored="true" required="false" multiValued="true"
        storeOffsetsWithPositions="true" termVectors="true"
termPositions="true" termOffsets="true" />

------------------ schema ------------------

And the java code I set the following params. Considering the multivalued
field above is called "text_msearchp")

SolrQuery solrQ = new SolrQuery();
solrQ.setFilterQueries( -- set some filters --);
solrQ.setStart(0);
solrQ.setRows( -- set max rows --);
solrQ.setQuery("text_msearchp"+":(\"+string_being_searched+ "\")");
// ativate highlight
solrQ.setHighlight(true);
solrQ.setHighlightSnippets(500);   // normally this number is low

// set highligher type
solrQ.setParam("hl.method", "unified");
// set highlight field to be the same as the search field
solrQ.setParam("hl.fl", "text_msearchp");
//Seta o termo que irá gerar o highlight
solrQ.setParam("hl.q", "text_msearchp"+":(\"+string_being_searched+ "\")");

----------------------------------------------------------------------------

Still, my tests indicate a significant speed up using omitNorms="false".
Best,

Reinaldo

On Tue, Dec 3, 2019 at 6:35 PM Erick Erickson <[hidden email]>
wrote:

> I suspect this is spurious. Norms are just an encoding
> of the length of a field, offhand I have no clue how having
> them (or not) would affect highlighting at all.
>
> Term _vectors_ OTOH could have a major impact. If
> FastVectorHighlighter is not used, the highlighter has
> to re-analyze the text in order to highlight, and if you’re
> highlighting in large text fields that can be very expensive.
>
> Norms, aren’t relevant there….
>
> So let’s see the full highlighter configuration you have, along
> with the field definition for the field you’re highlighting on.
>
> Best,
> Erick
>
> > On Dec 3, 2019, at 4:27 PM, Odysci <[hidden email]> wrote:
> >
> > I'm using solr-8.3.1 on a solrcloud set up with 2 solr nodes and 2 ZK
> nodes.
> > I was experiencing very slow search-with-highlighting on a index that had
> > 'omitNorms="true"' on all fields.
> > At the suggestion of a stackoverflow post, I changed all fields to be
> > 'omitNorms="false"' and the search-with-highlight time came down to about
> > 1/10th of what it was!!!
> >
> > This was a relatively small index and I had no issues with memory
> increase.
> > Now my question is whether I should expect the same speed up on regular
> > search calls, or search with only filters (no query)?
> > This would be on a different, much larger index - and I do want to incur
> > the memory increase unless the search is significantly faster.
> > Does anyone have any experience in comparing search speed using
> "omitNorms"
> > true or false in regular search (non-highlight)?
> > Thanks!
> >
> > Reinaldo
>
>