Alphanumeric sort with alphabets first

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Alphanumeric sort with alphabets first

Srinivasan Narayanan
Hello SOLR experts,

I am new to SOLR and I am trying to do alphanumeric sort on string field(s). However, in my case, alphabets should come before numbers. I also have a large number of such fields (~2500), any of which can be alphanumerically sorted upon at runtime. I’ve explored below concepts in SOLR to arrive at a solution:

1)      Custom similarity plugin : far fetched, and probably not even applicable to my usecase

2)      Analyzer/tokenizer and regex magic to left pad number parts with 0s : two disadvantages – I believe this needs extra fields (copy) to be created which I cannot do (2500 more fields is too much) and this will still push numbers before alphabets

3)      Custom function (ValueSource) and regex magic to left pad numeric tokens with 0s, and invoke function for sorting only – a bit better than the previous one, but still numbers come before alphabets.

4)      Custom function (ValueSource) and regex magic to left pad numeric tokens with 0s, prefix numeric tokens with tilde (~), and invoke function for sorting only – this is where I stand right now. Very ugly, but it works. Because tilde has a very high ASCII value, it pushes numbers behind alphabets.
There should obviously be a better approach I am missing. Please help!
Reply | Threaded
Open this post in threaded view
|

Re: Alphanumeric sort with alphabets first

Srinivasan Narayanan
Can someone please respond?

From: Srinivasan Narayanan <[hidden email]>
Date: Monday, March 13, 2017 at 3:51 PM
To: "[hidden email]" <[hidden email]>
Subject: Alphanumeric sort with alphabets first


Hello SOLR experts,

I am new to SOLR and I am trying to do alphanumeric sort on string field(s). However, in my case, alphabets should come before numbers. I also have a large number of such fields (~2500), any of which can be alphanumerically sorted upon at runtime. I’ve explored below concepts in SOLR to arrive at a solution:

1)      Custom similarity plugin : far fetched, and probably not even applicable to my usecase

2)      Analyzer/tokenizer and regex magic to left pad number parts with 0s : two disadvantages – I believe this needs extra fields (copy) to be created which I cannot do (2500 more fields is too much) and this will still push numbers before alphabets

3)      Custom function (ValueSource) and regex magic to left pad numeric tokens with 0s, and invoke function for sorting only – a bit better than the previous one, but still numbers come before alphabets.

4)      Custom function (ValueSource) and regex magic to left pad numeric tokens with 0s, prefix numeric tokens with tilde (~), and invoke function for sorting only – this is where I stand right now. Very ugly, but it works. Because tilde has a very high ASCII value, it pushes numbers behind alphabets.
There should obviously be a better approach I am missing. Please help!
Reply | Threaded
Open this post in threaded view
|

Re: Alphanumeric sort with alphabets first

Erick Erickson
I would back up further and say that 2500 fields is too much from the
start. Why do you need this many fields? And you say you can sort on
any of them... for a corpus of any decent size this is going to chew
up memory like crazy. Admittedly OS memory if you use docValues but
still memory.

That said, a custom sort function is probably the way to go if you
really need to.

Best,
Erick

On Thu, Mar 16, 2017 at 9:17 PM, Srinivasan Narayanan
<[hidden email]> wrote:

> Can someone please respond?
>
> From: Srinivasan Narayanan <[hidden email]>
> Date: Monday, March 13, 2017 at 3:51 PM
> To: "[hidden email]" <[hidden email]>
> Subject: Alphanumeric sort with alphabets first
>
>
> Hello SOLR experts,
>
> I am new to SOLR and I am trying to do alphanumeric sort on string field(s). However, in my case, alphabets should come before numbers. I also have a large number of such fields (~2500), any of which can be alphanumerically sorted upon at runtime. I’ve explored below concepts in SOLR to arrive at a solution:
>
> 1)      Custom similarity plugin : far fetched, and probably not even applicable to my usecase
>
> 2)      Analyzer/tokenizer and regex magic to left pad number parts with 0s : two disadvantages – I believe this needs extra fields (copy) to be created which I cannot do (2500 more fields is too much) and this will still push numbers before alphabets
>
> 3)      Custom function (ValueSource) and regex magic to left pad numeric tokens with 0s, and invoke function for sorting only – a bit better than the previous one, but still numbers come before alphabets.
>
> 4)      Custom function (ValueSource) and regex magic to left pad numeric tokens with 0s, prefix numeric tokens with tilde (~), and invoke function for sorting only – this is where I stand right now. Very ugly, but it works. Because tilde has a very high ASCII value, it pushes numbers behind alphabets.
> There should obviously be a better approach I am missing. Please help!