SOLR 3.3.0 multivalued field sort problem

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

SOLR 3.3.0 multivalued field sort problem

johnnyisrael
Hi,

I am currently using SOLR 1.4.1, With this version sorting working fine even in multivalued field.

Now I am planning to upgrade my SOLR version from 1.4.1 --> 3.3.0, In this latest version sorting is not working on multivauled field.

So I am in unable to upgrade my SOLR due to this drawback.

Is there a work around available to fix this problem?

Thanks,

Johnny
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 3.3.0 multivalued field sort problem

Péter Király
Hi,

There is no direct solution, you have to create single value field(s)
to create search. I am aware of two workarounds:

- you can use a random or a given (e.g. the first) instance of the
multiple values of the field, and that would be your sortable field.
- you can create two sortable fields: nnnn_min and nnnn_max, which
contains the minimal and maximal values of the given field values.

At least, that's what I do. Probably there are other solutions as well.

Péter
--
eXtensible Catalog
http://drupal.org/project/xc


2011/8/12 johnnyisrael <[hidden email]>:

> Hi,
>
> I am currently using SOLR 1.4.1, With this version sorting working fine even
> in multivalued field.
>
> Now I am planning to upgrade my SOLR version from 1.4.1 --> 3.3.0, In this
> latest version sorting is not working on multivauled field.
>
> So I am in unable to upgrade my SOLR due to this drawback.
>
> Is there a work around available to fix this problem?
>
> Thanks,
>
> Johnny
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-3-0-multivalued-field-sort-problem-tp3248778p3248778.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 3.3.0 multivalued field sort problem

Martijn v Groningen
Hi Johnny,

Sorting on a multivalued field has never really worked in Solr.
Solr versions <= 1.4.1 allowed it, but there was a change that an error
occurred and that the sorting might not be what you expect.
From Solr 3.1 and up sorting on a multivalued isn't allowed and a http 400
is returned.

Duplicating documents or fields (what Peter describes) is as far as I know
they only option until Lucene supports sorting on multivalued fields
properly.

Martijn

2011/8/12 Péter Király <[hidden email]>

> Hi,
>
> There is no direct solution, you have to create single value field(s)
> to create search. I am aware of two workarounds:
>
> - you can use a random or a given (e.g. the first) instance of the
> multiple values of the field, and that would be your sortable field.
> - you can create two sortable fields: nnnn_min and nnnn_max, which
> contains the minimal and maximal values of the given field values.
>
> At least, that's what I do. Probably there are other solutions as well.
>
> Péter
> --
> eXtensible Catalog
> http://drupal.org/project/xc
>
>
> 2011/8/12 johnnyisrael <[hidden email]>:
> > Hi,
> >
> > I am currently using SOLR 1.4.1, With this version sorting working fine
> even
> > in multivalued field.
> >
> > Now I am planning to upgrade my SOLR version from 1.4.1 --> 3.3.0, In
> this
> > latest version sorting is not working on multivauled field.
> >
> > So I am in unable to upgrade my SOLR due to this drawback.
> >
> > Is there a work around available to fix this problem?
> >
> > Thanks,
> >
> > Johnny
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-3-3-0-multivalued-field-sort-problem-tp3248778p3248778.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



--
Met vriendelijke groet,

Martijn van Groningen
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 3.3.0 multivalued field sort problem

Erick Erickson
The problem I've always had is that I don't quite know what
"sorting on multivalued fields" means. If your field had tokens
aaaaa and zzzzz, would sorting on that field put the doc
at the beginning or end of the list? Sure, you can define
rules (first token, last token, average of all tokens (whatever
that means)), but each solution would be wrong sometime,
somewhere, and/or completely useless.

I'd love to have a better answer....

Best
Erick

On Fri, Aug 12, 2011 at 11:32 AM, Martijn v Groningen
<[hidden email]> wrote:

> Hi Johnny,
>
> Sorting on a multivalued field has never really worked in Solr.
> Solr versions <= 1.4.1 allowed it, but there was a change that an error
> occurred and that the sorting might not be what you expect.
> From Solr 3.1 and up sorting on a multivalued isn't allowed and a http 400
> is returned.
>
> Duplicating documents or fields (what Peter describes) is as far as I know
> they only option until Lucene supports sorting on multivalued fields
> properly.
>
> Martijn
>
> 2011/8/12 Péter Király <[hidden email]>
>
>> Hi,
>>
>> There is no direct solution, you have to create single value field(s)
>> to create search. I am aware of two workarounds:
>>
>> - you can use a random or a given (e.g. the first) instance of the
>> multiple values of the field, and that would be your sortable field.
>> - you can create two sortable fields: nnnn_min and nnnn_max, which
>> contains the minimal and maximal values of the given field values.
>>
>> At least, that's what I do. Probably there are other solutions as well.
>>
>> Péter
>> --
>> eXtensible Catalog
>> http://drupal.org/project/xc
>>
>>
>> 2011/8/12 johnnyisrael <[hidden email]>:
>> > Hi,
>> >
>> > I am currently using SOLR 1.4.1, With this version sorting working fine
>> even
>> > in multivalued field.
>> >
>> > Now I am planning to upgrade my SOLR version from 1.4.1 --> 3.3.0, In
>> this
>> > latest version sorting is not working on multivauled field.
>> >
>> > So I am in unable to upgrade my SOLR due to this drawback.
>> >
>> > Is there a work around available to fix this problem?
>> >
>> > Thanks,
>> >
>> > Johnny
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/SOLR-3-3-0-multivalued-field-sort-problem-tp3248778p3248778.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>>
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 3.3.0 multivalued field sort problem

Michael Lackhoff-2
On 13.08.2011 18:03 Erick Erickson wrote:

> The problem I've always had is that I don't quite know what
> "sorting on multivalued fields" means. If your field had tokens
> aaaaa and zzzzz, would sorting on that field put the doc
> at the beginning or end of the list? Sure, you can define
> rules (first token, last token, average of all tokens (whatever
> that means)), but each solution would be wrong sometime,
> somewhere, and/or completely useless.

Of course it would need rules but I think it wouldn't be too hard to
find rules that are at least far better than the current situation.

My wish would include an option that decides if the field can be used
just once or every value on its own. If the option is set to FALSE, only
the first value would be used, if it is TRUE, every value of the field
would get its place in the result list.

so, if we have e.g.
record1: ccc and bbb
record2: aaa and zzz
it would be either
record2 (aaa)
record1 (ccc)
or
record2 (aaa)
record1 (bbb)
record1 (ccc)
record2 (zzz)

I find these two outcomes most plausible so I would allow them if
technical possible but whatever rule looks more plausible to the
experts: some solution is better than no solution.

-Michael
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 3.3.0 multivalued field sort problem

Martijn v Groningen
The first solution would make sense to me. Some kind of a strategy
mechanism
for this would allow anyone to define their own rules. Duplicating results
would be confusing to me.

On 13 August 2011 18:39, Michael Lackhoff <[hidden email]> wrote:

> On 13.08.2011 18:03 Erick Erickson wrote:
>
> > The problem I've always had is that I don't quite know what
> > "sorting on multivalued fields" means. If your field had tokens
> > aaaaa and zzzzz, would sorting on that field put the doc
> > at the beginning or end of the list? Sure, you can define
> > rules (first token, last token, average of all tokens (whatever
> > that means)), but each solution would be wrong sometime,
> > somewhere, and/or completely useless.
>
> Of course it would need rules but I think it wouldn't be too hard to
> find rules that are at least far better than the current situation.
>
> My wish would include an option that decides if the field can be used
> just once or every value on its own. If the option is set to FALSE, only
> the first value would be used, if it is TRUE, every value of the field
> would get its place in the result list.
>
> so, if we have e.g.
> record1: ccc and bbb
> record2: aaa and zzz
> it would be either
> record2 (aaa)
> record1 (ccc)
> or
> record2 (aaa)
> record1 (bbb)
> record1 (ccc)
> record2 (zzz)
>
> I find these two outcomes most plausible so I would allow them if
> technical possible but whatever rule looks more plausible to the
> experts: some solution is better than no solution.
>
> -Michael
>



--
Met vriendelijke groet,

Martijn van Groningen
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 3.3.0 multivalued field sort problem

Billnbell
I have a different use case. Consider a spatial multivalued field with latlong values for addresses. I would want sort by geodist() to return the closest distance in each group. For example find me the closest restaurant which each doc being a chain name like pizza hut. Or doctors with multiple offices.

Bill Bell
Sent from mobile


On Aug 13, 2011, at 12:31 PM, Martijn v Groningen <[hidden email]> wrote:

> The first solution would make sense to me. Some kind of a strategy
> mechanism
> for this would allow anyone to define their own rules. Duplicating results
> would be confusing to me.
>
> On 13 August 2011 18:39, Michael Lackhoff <[hidden email]> wrote:
>
>> On 13.08.2011 18:03 Erick Erickson wrote:
>>
>>> The problem I've always had is that I don't quite know what
>>> "sorting on multivalued fields" means. If your field had tokens
>>> aaaaa and zzzzz, would sorting on that field put the doc
>>> at the beginning or end of the list? Sure, you can define
>>> rules (first token, last token, average of all tokens (whatever
>>> that means)), but each solution would be wrong sometime,
>>> somewhere, and/or completely useless.
>>
>> Of course it would need rules but I think it wouldn't be too hard to
>> find rules that are at least far better than the current situation.
>>
>> My wish would include an option that decides if the field can be used
>> just once or every value on its own. If the option is set to FALSE, only
>> the first value would be used, if it is TRUE, every value of the field
>> would get its place in the result list.
>>
>> so, if we have e.g.
>> record1: ccc and bbb
>> record2: aaa and zzz
>> it would be either
>> record2 (aaa)
>> record1 (ccc)
>> or
>> record2 (aaa)
>> record1 (bbb)
>> record1 (ccc)
>> record2 (zzz)
>>
>> I find these two outcomes most plausible so I would allow them if
>> technical possible but whatever rule looks more plausible to the
>> experts: some solution is better than no solution.
>>
>> -Michael
>>
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 3.3.0 multivalued field sort problem

Michael Lackhoff-2
In reply to this post by Martijn v Groningen
On 13.08.2011 20:31 Martijn v Groningen wrote:

> The first solution would make sense to me. Some kind of a strategy
> mechanism
> for this would allow anyone to define their own rules. Duplicating results
> would be confusing to me.

That is why I would only activate it on request (setting a special
option). Example use case: A library catalogue with an author sort. All
books of an author would be together, no matter how many co-authors the
book has.
So I think it could be useful (as an option) but I have no idea how
diffcult it would be to implement. As I said, it would be nice to have
at least something. Any possible customization would be an extra bonus.

-Michael
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 3.3.0 multivalued field sort problem

Erick Erickson
In reply to this post by Michael Lackhoff-2
Fair enough, but what's "first value in the list"?
There's nothing special about "mutliValued" fields,
that is where the schema has "multiValued=true".
under the covers, this is no different than just
concatenating all the values together and putting them
in at one go, except for some games with the
position between one term and another
(positionIncrementGap). Part of my confusion is
that the term multi-valued is sometimes used to
refer to "multiValued=true" and sometimes used
to refer to documents with more than one
*token* in a particular field (often as the result
of the analysis chain)

The second case seems to be more in the
grouping/field collapsing arena, although
that doesn't work on fields with more than one
value yet either. But that seems a more sensible
place to put the second case rather than
overloading sorting.

I guess my take on the issue is that sorting has a
pretty specific meaning, and that rather than
overload sorting I'd rather see if the use-cases
are best served by another mechanism.


Best
Erick

On Sat, Aug 13, 2011 at 12:39 PM, Michael Lackhoff <[hidden email]> wrote:

> On 13.08.2011 18:03 Erick Erickson wrote:
>
>> The problem I've always had is that I don't quite know what
>> "sorting on multivalued fields" means. If your field had tokens
>> aaaaa and zzzzz, would sorting on that field put the doc
>> at the beginning or end of the list? Sure, you can define
>> rules (first token, last token, average of all tokens (whatever
>> that means)), but each solution would be wrong sometime,
>> somewhere, and/or completely useless.
>
> Of course it would need rules but I think it wouldn't be too hard to
> find rules that are at least far better than the current situation.
>
> My wish would include an option that decides if the field can be used
> just once or every value on its own. If the option is set to FALSE, only
> the first value would be used, if it is TRUE, every value of the field
> would get its place in the result list.
>
> so, if we have e.g.
> record1: ccc and bbb
> record2: aaa and zzz
> it would be either
> record2 (aaa)
> record1 (ccc)
> or
> record2 (aaa)
> record1 (bbb)
> record1 (ccc)
> record2 (zzz)
>
> I find these two outcomes most plausible so I would allow them if
> technical possible but whatever rule looks more plausible to the
> experts: some solution is better than no solution.
>
> -Michael
>
Reply | Threaded
Open this post in threaded view
|

Re: SOLR 3.3.0 multivalued field sort problem

Michael Lackhoff-2
On 13.08.2011 21:28 Erick Erickson wrote:

> Fair enough, but what's "first value in the list"?
> There's nothing special about "mutliValued" fields,
> that is where the schema has "multiValued=true".
> under the covers, this is no different than just
> concatenating all the values together and putting them
> in at one go, except for some games with the
> position between one term and another
> (positionIncrementGap). Part of my confusion is
> that the term multi-valued is sometimes used to
> refer to "multiValued=true" and sometimes used
> to refer to documents with more than one
> *token* in a particular field (often as the result
> of the analysis chain)

I guess, since multivalued fields are not really different under the
hood, they should be treated the same. So, no matter if the different
values are the result of a "multiValued=true" or of the analysis chain:
if the whole thing starts with an "a" put it first, if it starts with a
"z" put it last.
Example (multivalued field):
Smith, Adam
Duck, Dagobert
=> sort as "s" (or "S")
Example tokenized field:
This is a tokenized field
=> sort as "t" (or "T")

> The second case seems to be more in the
> grouping/field collapsing arena, although
> that doesn't work on fields with more than one
> value yet either. But that seems a more sensible
> place to put the second case rather than
> overloading sorting.

It depends how you see the meaning of sorting:
1. Sort the records based on one single value per record (and return
them in this order)
2. Sort the values of the field to sort on (and return the records
belonging to the respective values)

As long as sorting is only allowed on single value fields, both are
identical. As soon as you allow multivalued fields to be sorted on, both
interpretations mean something different and I think both have their
valid use case.
But I don't want to stress this too far.

-Michael