Performance penalty for Multivalued field?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance penalty for Multivalued field?

Max Hütter
Hi,

I have a Solr-instance where many documents containing the same field
several times. Rightnow I use a stylesheet which will collect these
duplicate fields into one field for indexing. I guess this a case where
I should use a multivalued field, but the problem is I don't know in
advance (when creating the schema.xml) which fields will actually appear
 many times in one document. I thought about just making all field
multivalued. Is there a problem with this? Will it make my index to
large? Is there a performance penalty for this (probably), how bad would
that be?

What are you experiences?

Best regards

--
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel            :  (+49) 0711 - 45 10 17 578
Fax            :  (+49) 0711 - 45 10 17 573
e-mail         :  [hidden email]
Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Reply | Threaded
Open this post in threaded view
|

Re: Performance penalty for Multivalued field?

Mike Klaas
On 3/15/07, Maximilian Hütter <[hidden email]> wrote:

> Hi,
>
> I have a Solr-instance where many documents containing the same field
> several times. Rightnow I use a stylesheet which will collect these
> duplicate fields into one field for indexing. I guess this a case where
> I should use a multivalued field, but the problem is I don't know in
> advance (when creating the schema.xml) which fields will actually appear
>  many times in one document. I thought about just making all field
> multivalued. Is there a problem with this? Will it make my index to
> large? Is there a performance penalty for this (probably), how bad would
> that be?

I would be shocked if you noticed any performance difference between a
single-valued field and a multivalued field with one entry.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Performance penalty for Multivalued field?

Chris Hostetter-3

: I would be shocked if you noticed any performance difference between a
: single-valued field and a multivalued field with one entry.

there shouldnt' be any difference at all in search performance, or index
size ... marking a field multiValued should only have two effects:

  1) on document add/update no error will be thrown if more then one value
is enncountered
  2) the response writer may created a slightly larger response document
since it's rendering a list instead of an item (even if every document
returned only has one item in the list)

(this is one of hte big differneces between using version=2.0 and
version=2.1 with XmlResponseWriter ... in 2.0 it would only make a list if
there is more then one item - even if the field is multiValued)



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Performance penalty for Multivalued field?

jjlarrea
Perhaps not relevant in this case, but for the record there is one more SOLR behavior affected by multiValued:

  3) when faceting, a multiValued field always uses the TermEnum algorithm rather than the FieldCache algorithm.

depending on the data, this can have a dramatic effect on faceting performance: TermEnum is good for a limited number of different indexed terms in the field, and allows multiple terms per field per document, while FieldCache is good for a large number of indexed values relative to the number of documents, and only allows a single term per field per document.

- J.J.

At 10:57 AM -0700 3/16/07, Chris Hostetter wrote:

>: I would be shocked if you noticed any performance difference between a
>: single-valued field and a multivalued field with one entry.
>
>there shouldnt' be any difference at all in search performance, or index
>size ... marking a field multiValued should only have two effects:
>
>  1) on document add/update no error will be thrown if more then one value
>is enncountered
>  2) the response writer may created a slightly larger response document
>since it's rendering a list instead of an item (even if every document
>returned only has one item in the list)
>
>(this is one of hte big differneces between using version=2.0 and
>version=2.1 with XmlResponseWriter ... in 2.0 it would only make a list if
>there is more then one item - even if the field is multiValued)
>
>
>
>-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Performance penalty for Multivalued field?

Chris Hostetter-3

:   3) when faceting, a multiValued field always uses the TermEnum
: algorithm rather than the FieldCache algorithm.

Damn.  Good catch J.J. ... i totally forgot about that and it certainly is
a "performance penalty" if you use facets.


: depending on the data, this can have a dramatic effect on faceting
: performance: TermEnum is good for a limited number of different indexed
: terms in the field, and allows multiple terms per field per document,
: while FieldCache is good for a large number of indexed values relative
: to the number of documents, and only allows a single term per field per
: document.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Performance penalty for Multivalued field?

Max Hütter
In reply to this post by Chris Hostetter-3
Chris Hostetter schrieb:

> : I would be shocked if you noticed any performance difference between a
> : single-valued field and a multivalued field with one entry.
>
> there shouldnt' be any difference at all in search performance, or index
> size ... marking a field multiValued should only have two effects:
>
>   1) on document add/update no error will be thrown if more then one value
> is enncountered
>   2) the response writer may created a slightly larger response document
> since it's rendering a list instead of an item (even if every document
> returned only has one item in the list)
>
> (this is one of hte big differneces between using version=2.0 and
> version=2.1 with XmlResponseWriter ... in 2.0 it would only make a list if
> there is more then one item - even if the field is multiValued)
>
>
>
> -Hoss
>
>

So I probably got that wrong, with the multivalued fields. I thought
they not only not throw an error but really index both values.
What happens when I use a multivalue field and that is field appears two
times in a add/update doc? Is the second value lost?

Best regards,

Max

--
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel            :  (+49) 0711 - 45 10 17 578
Fax            :  (+49) 0711 - 45 10 17 573
e-mail         :  [hidden email]
Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Reply | Threaded
Open this post in threaded view
|

Re: Performance penalty for Multivalued field?

Chris Hostetter-3

: So I probably got that wrong, with the multivalued fields. I thought
: they not only not throw an error but really index both values.
: What happens when I use a multivalue field and that is field appears two
: times in a add/update doc? Is the second value lost?

sorry ,i guess i should have clarified that there is a "0th" effect: all
values get indexed.  (the error when multivalued=false is to prevent all
the values from getting indexed, since you indicated you didn't wnat
multiple values when you set that option)




-Hoss