Sorting fields of text_general fieldType

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Sorting fields of text_general fieldType

Anupam Bhattacharya
I recently came across this strange issue.

The title sort works in a strange manner because the SOLR server treats
title string based on Upper Case or Lower Case String. Thus if we sort in
ascending order, first the title with numeric shows up then the titles in
alphabetical order which starts with Upper Case & after that the titles
starting with Lowercase.

The title field is indexed as text_general fieldtype.

<field name="title" type="text_general" indexed="true" stored="true"/>

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="
100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt" enablePositionIncrements="true"/>
<!--
in this example, we will only use synonyms at query time <filter
class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase
="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

How can i sort normally as numeric then all the listing in alphabetical
order irrespective of LowerCase or UpperCase ?

Thanks
Anupam
Reply | Threaded
Open this post in threaded view
|

Re: Sorting fields of text_general fieldType

iorixxx
> The title sort works in a strange manner because the SOLR
> server treats
> title string based on Upper Case or Lower Case String. Thus
> if we sort in
> ascending order, first the title with numeric shows up then
> the titles in
> alphabetical order which starts with Upper Case & after
> that the titles
> starting with Lowercase.
>
> The title field is indexed as text_general fieldtype.
>
> <field name="title" type="text_general" indexed="true"
> stored="true"/>

Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2

Simply create an additional field named title_sortable with the following type

 <!-- lowercases the entire field value, keeping it as a single token.  -->
    <fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.TrimFilterFactory" />
      </analyzer>
    </fieldType>

Populate it via copyField directive :

  <copyField source="title" dest="title_sortable" maxChars="N"/>

then &sort=title_sortable asc


Reply | Threaded
Open this post in threaded view
|

Re: Sorting fields of text_general fieldType

Anupam Bhattacharya
The approach used to work perfectly.

But recently i realized that it is not working for more than 300000 indexed
records.
I am using SOLR 3.5 version.

Is there another approach to SORT a title field in proper alphabetical
order irrespective of Lower case and Upper case.

Regards
Anupam

On Thu, May 17, 2012 at 4:32 PM, Ahmet Arslan <[hidden email]> wrote:

> > The title sort works in a strange manner because the SOLR
> > server treats
> > title string based on Upper Case or Lower Case String. Thus
> > if we sort in
> > ascending order, first the title with numeric shows up then
> > the titles in
> > alphabetical order which starts with Upper Case & after
> > that the titles
> > starting with Lowercase.
> >
> > The title field is indexed as text_general fieldtype.
> >
> > <field name="title" type="text_general" indexed="true"
> > stored="true"/>
>
> Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2
>
> Simply create an additional field named title_sortable with the following
> type
>
>  <!-- lowercases the entire field value, keeping it as a single token.  -->
>     <fieldType name="lowercase" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory" />
>         <filter class="solr.TrimFilterFactory" />
>       </analyzer>
>     </fieldType>
>
> Populate it via copyField directive :
>
>   <copyField source="title" dest="title_sortable" maxChars="N"/>
>
> then &sort=title_sortable asc
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Sorting fields of text_general fieldType

Lance Norskog-2
Give us some pairs of titles which sort the wrong way.

On Thu, Aug 2, 2012 at 10:06 AM, Anupam Bhattacharya
<[hidden email]> wrote:

> The approach used to work perfectly.
>
> But recently i realized that it is not working for more than 300000 indexed
> records.
> I am using SOLR 3.5 version.
>
> Is there another approach to SORT a title field in proper alphabetical
> order irrespective of Lower case and Upper case.
>
> Regards
> Anupam
>
> On Thu, May 17, 2012 at 4:32 PM, Ahmet Arslan <[hidden email]> wrote:
>
>> > The title sort works in a strange manner because the SOLR
>> > server treats
>> > title string based on Upper Case or Lower Case String. Thus
>> > if we sort in
>> > ascending order, first the title with numeric shows up then
>> > the titles in
>> > alphabetical order which starts with Upper Case & after
>> > that the titles
>> > starting with Lowercase.
>> >
>> > The title field is indexed as text_general fieldtype.
>> >
>> > <field name="title" type="text_general" indexed="true"
>> > stored="true"/>
>>
>> Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2
>>
>> Simply create an additional field named title_sortable with the following
>> type
>>
>>  <!-- lowercases the entire field value, keeping it as a single token.  -->
>>     <fieldType name="lowercase" class="solr.TextField"
>> positionIncrementGap="100">
>>       <analyzer>
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>         <filter class="solr.LowerCaseFilterFactory" />
>>         <filter class="solr.TrimFilterFactory" />
>>       </analyzer>
>>     </fieldType>
>>
>> Populate it via copyField directive :
>>
>>   <copyField source="title" dest="title_sortable" maxChars="N"/>
>>
>> then &sort=title_sortable asc
>>
>>
>>



--
Lance Norskog
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Sorting fields of text_general fieldType

Anupam Bhattacharya
Few titles are as following:

Embattled JPMorgan boss survives power challenge - Jakarta Globe

Kitten Survives 6500-Mile Trip in China-US Container - Jakarta Globe

Guard survives hail of bullets - Jakarta Post

On Fri, Aug 3, 2012 at 1:09 PM, Lance Norskog <[hidden email]> wrote:

> Give us some pairs of titles which sort the wrong way.
>
> On Thu, Aug 2, 2012 at 10:06 AM, Anupam Bhattacharya
> <[hidden email]> wrote:
> > The approach used to work perfectly.
> >
> > But recently i realized that it is not working for more than 300000
> indexed
> > records.
> > I am using SOLR 3.5 version.
> >
> > Is there another approach to SORT a title field in proper alphabetical
> > order irrespective of Lower case and Upper case.
> >
> > Regards
> > Anupam
> >
> > On Thu, May 17, 2012 at 4:32 PM, Ahmet Arslan <[hidden email]> wrote:
> >
> >> > The title sort works in a strange manner because the SOLR
> >> > server treats
> >> > title string based on Upper Case or Lower Case String. Thus
> >> > if we sort in
> >> > ascending order, first the title with numeric shows up then
> >> > the titles in
> >> > alphabetical order which starts with Upper Case & after
> >> > that the titles
> >> > starting with Lowercase.
> >> >
> >> > The title field is indexed as text_general fieldtype.
> >> >
> >> > <field name="title" type="text_general" indexed="true"
> >> > stored="true"/>
> >>
> >> Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2
> >>
> >> Simply create an additional field named title_sortable with the
> following
> >> type
> >>
> >>  <!-- lowercases the entire field value, keeping it as a single token.
>  -->
> >>     <fieldType name="lowercase" class="solr.TextField"
> >> positionIncrementGap="100">
> >>       <analyzer>
> >>         <tokenizer class="solr.KeywordTokenizerFactory"/>
> >>         <filter class="solr.LowerCaseFilterFactory" />
> >>         <filter class="solr.TrimFilterFactory" />
> >>       </analyzer>
> >>     </fieldType>
> >>
> >> Populate it via copyField directive :
> >>
> >>   <copyField source="title" dest="title_sortable" maxChars="N"/>
> >>
> >> then &sort=title_sortable asc
> >>
> >>
> >>
>
>
>
> --
> Lance Norskog
> [hidden email]
>



--
Thanks & Regards
Anupam Bhattacharya
Reply | Threaded
Open this post in threaded view
|

Re: Sorting fields of text_general fieldType

Erick Erickson
Did you re-index everything after the change you made? Your old docs
will be sorted by null values in the title_sort field, so they'd all come out
first or last depending, then sub-sorted by internal Lucene doc ID.

If you have, can you just create an index with, say, 6 titles that sorts
improperly and give us the output from your app?

I find it very unlikely that this is really broken, lots and lots and lots
of people are using this all the time so my first guess is it's something
you're doing that _seems_ harmless. Don't get me wrong, there could
indeed be a bug here, it just seems unlikely.....

To be really safe, I'd stop my Solr server and blow away the
<solr_home>/data/index directory. Remove the directory itself
not just the contents and start indexing over again.

Best
Erick

On Fri, Aug 3, 2012 at 4:30 AM, Anupam Bhattacharya <[hidden email]> wrote:

> Few titles are as following:
>
> Embattled JPMorgan boss survives power challenge - Jakarta Globe
>
> Kitten Survives 6500-Mile Trip in China-US Container - Jakarta Globe
>
> Guard survives hail of bullets - Jakarta Post
>
> On Fri, Aug 3, 2012 at 1:09 PM, Lance Norskog <[hidden email]> wrote:
>
>> Give us some pairs of titles which sort the wrong way.
>>
>> On Thu, Aug 2, 2012 at 10:06 AM, Anupam Bhattacharya
>> <[hidden email]> wrote:
>> > The approach used to work perfectly.
>> >
>> > But recently i realized that it is not working for more than 300000
>> indexed
>> > records.
>> > I am using SOLR 3.5 version.
>> >
>> > Is there another approach to SORT a title field in proper alphabetical
>> > order irrespective of Lower case and Upper case.
>> >
>> > Regards
>> > Anupam
>> >
>> > On Thu, May 17, 2012 at 4:32 PM, Ahmet Arslan <[hidden email]> wrote:
>> >
>> >> > The title sort works in a strange manner because the SOLR
>> >> > server treats
>> >> > title string based on Upper Case or Lower Case String. Thus
>> >> > if we sort in
>> >> > ascending order, first the title with numeric shows up then
>> >> > the titles in
>> >> > alphabetical order which starts with Upper Case & after
>> >> > that the titles
>> >> > starting with Lowercase.
>> >> >
>> >> > The title field is indexed as text_general fieldtype.
>> >> >
>> >> > <field name="title" type="text_general" indexed="true"
>> >> > stored="true"/>
>> >>
>> >> Please see Otis' response http://search-lucene.com/m/uDxTF1scW0d2
>> >>
>> >> Simply create an additional field named title_sortable with the
>> following
>> >> type
>> >>
>> >>  <!-- lowercases the entire field value, keeping it as a single token.
>>  -->
>> >>     <fieldType name="lowercase" class="solr.TextField"
>> >> positionIncrementGap="100">
>> >>       <analyzer>
>> >>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>> >>         <filter class="solr.LowerCaseFilterFactory" />
>> >>         <filter class="solr.TrimFilterFactory" />
>> >>       </analyzer>
>> >>     </fieldType>
>> >>
>> >> Populate it via copyField directive :
>> >>
>> >>   <copyField source="title" dest="title_sortable" maxChars="N"/>
>> >>
>> >> then &sort=title_sortable asc
>> >>
>> >>
>> >>
>>
>>
>>
>> --
>> Lance Norskog
>> [hidden email]
>>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya