Simple Sort Is Not Working In Solr 4.7?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Simple Sort Is Not Working In Solr 4.7?

Simon Cheng
Hi,

I don't know whether it is my setup or any other reasons. But the fact is
that a very simple sort is not working in my Solr 4.7 environment.

The query is very simple :
http://localhost:8983/solr/bibs/select?q=author:soros&fl=id,author,title&sort=title+asc&wt=xml&start=0&indent=true

And the output is NOT sorted according to title :

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="sort">title asc</str>
<str name="fl">id,author,title</str>
<str name="indent">true</str>
<str name="start">0</str>
<str name="q">author:soros</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="13" start="0">
<doc>
<str name="id">9018</str>
<arr name="author">
<str>Soros, George, 1930-</str>
</arr>
<str name="title">
The alchemy of finance : reading the mind of the market / George Soros
</str>
</doc>
<doc>
<str name="id">15785</str>
<arr name="author">
<str>Soros, George, 1930-</str>
<str>Soros Foundations</str>
</arr>
<str name="title">Bosnia / by George Soros</str>
</doc>
<doc>
<str name="id">16281</str>
<arr name="author">
<str>Soros, George, 1930-</str>
<str>Soros Foundations</str>
</arr>
<str name="title">
Prospect for European disintegration / by George Soros
</str>
</doc>
<doc>
<str name="id">25807</str>
<arr name="author">
<str>Soros, George</str>
</arr>
<str name="title">
Open society : reforming global capitalism / George Soros
</str>
</doc>
<doc>
<str name="id">27440</str>
<str name="title">George Soros on globalization</str>
<arr name="author">
<str>Soros, George, 1930-</str>
</arr>
</doc>
<doc>
<str name="id">22254</str>
<arr name="author">
<str>Soros, George, 1930-</str>
</arr>
<str name="title">
The crisis of global capitalism : open society endangered / George Soros
</str>
</doc>
<doc>
<str name="id">16914</str>
<arr name="author">
<str>Soros, George, 1930-</str>
<str>Soros Fund Management</str>
</arr>
<str name="title">The theory of reflexivity / by George Soros</str>
</doc>
<doc>
<str name="id">17343</str>
<str name="title">
Financial turmoil in Europe and the United States : essays / George Soros
</str>
<arr name="author">
<str>Soros, George, 1930-</str>
</arr>
</doc>
<doc>
<str name="id">15542</str>
<arr name="author">
<str>Soros, George, 1930-</str>
<str>Harvard Club of New York City</str>
</arr>
<str name="title">
Nationalist dictatorships versus open society / by George Soros
</str>
</doc>
<doc>
<str name="id">15891</str>
<arr name="author">
<str>Soros, George</str>
</arr>
<str name="title">
The new paradigm for financial markets : the credit crisis of 2008 and what
it means / George Soros
</str>
</doc>
</result>
</response>

Thank you for the help in advance,
Simon.
Reply | Threaded
Open this post in threaded view
|

Re: Simple Sort Is Not Working In Solr 4.7?

Alexandre Rafalovitch
What's the field definition for your "title" field? Is it just string
or are you doing some tokenizing?

It should be a string or a single token cleaned up (e.g. lower-cased)
using KeywordTokenizer. In the example schema, you will normally see
the original field tokenized and the sort field separately with
copyField connection. In latest Solr, docValues are also recommended
for sort fields.

Regards,
   Alex.
----
Sign up for my Solr resources newsletter at http://www.solr-start.com/

On 17 February 2015 at 19:52, Simon Cheng <[hidden email]> wrote:
> I don't know whether it is my setup or any other reasons. But the fact is
> that a very simple sort is not working in my Solr 4.7 environment.
>
> The query is very simple :
> http://localhost:8983/solr/bibs/select?q=author:soros&fl=id,author,title&sort=title+asc&wt=xml&start=0&indent=true
>
> And the output is NOT sorted according to title :
Reply | Threaded
Open this post in threaded view
|

Re: Simple Sort Is Not Working In Solr 4.7?

Simon Cheng
Hi Alex,

It's simply defined like this in the schema.xml :

   <field name="title" type="text_general" indexed="true" stored="true"
multiValued="false"/>

and it is cloned to the other multi-valued field o_title :

   <copyField source="title" dest="o_title"/>

Should I simply change the type to be "string" instead?

Thanks again,
Simon.


On Wed, Feb 18, 2015 at 12:00 PM, Alexandre Rafalovitch <[hidden email]>
wrote:

> What's the field definition for your "title" field? Is it just string
> or are you doing some tokenizing?
>
> It should be a string or a single token cleaned up (e.g. lower-cased)
> using KeywordTokenizer. In the example schema, you will normally see
> the original field tokenized and the sort field separately with
> copyField connection. In latest Solr, docValues are also recommended
> for sort fields.
>
> Regards,
>    Alex.
>
Reply | Threaded
Open this post in threaded view
|

Re: Simple Sort Is Not Working In Solr 4.7?

Alexandre Rafalovitch
If you are not searching against the "title" field directly, you can
change it to string. If you do, create a separate one, specifically
for sorting. You should be able to use docValues with that field even
in Solr 4.7.

Remember to re-index.

Regards,
   Alex.
----
Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 17 February 2015 at 20:16, Simon Cheng <[hidden email]> wrote:

> Hi Alex,
>
> It's simply defined like this in the schema.xml :
>
>    <field name="title" type="text_general" indexed="true" stored="true"
> multiValued="false"/>
>
> and it is cloned to the other multi-valued field o_title :
>
>    <copyField source="title" dest="o_title"/>
>
> Should I simply change the type to be "string" instead?
>
> Thanks again,
> Simon.
>
>
> On Wed, Feb 18, 2015 at 12:00 PM, Alexandre Rafalovitch <[hidden email]>
> wrote:
>
>> What's the field definition for your "title" field? Is it just string
>> or are you doing some tokenizing?
>>
>> It should be a string or a single token cleaned up (e.g. lower-cased)
>> using KeywordTokenizer. In the example schema, you will normally see
>> the original field tokenized and the sort field separately with
>> copyField connection. In latest Solr, docValues are also recommended
>> for sort fields.
>>
>> Regards,
>>    Alex.
>>
Reply | Threaded
Open this post in threaded view
|

Re: Simple Sort Is Not Working In Solr 4.7?

Simon Cheng
Hi Alex,

It's okay after I added in a new field "s_title" in the schema and
re-indexed.

   <field name="s_title" type="string" indexed="true" stored="false"
multiValued="false"/>
   <copyField source="title" dest="s_title"/>

But how can I ignore the articles ("A", "An", "The") in the sorting. As you
can see from the below example :

http://localhost:8983/solr/bibs/select?q=singapore&fl=id,title&sort=s_title+asc&wt=xml&start=0&rows=20&indent=true

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="q">singapore</str>
<str name="indent">true</str>
<str name="fl">id,title</str>
<str name="start">0</str>
<str name="sort">s_title asc</str>
<str name="rows">20</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="18" start="0">
<doc>
<str name="id">36</str>
<str name="title">
5th SEACEN-Toronto Centre Leadership Seminar for Senior Management of
Central Banks on Financial System Oversight, 16-21 Oct 2005, Singapore
</str>
</doc>
<doc>
<str name="id">70</str>
<str name="title">
Anti-money laundering & counter-terrorism financing / Commercial Affairs
Dept
</str>
</doc>
<doc>
<str name="id">15</str>
<str name="title">
China's anti-secession law : a legal perspective / Zou, Keyuan
</str>
</doc>
<doc>
<str name="id">12</str>
<str name="title">
China's currency peg : firm in the eye of the storm / Calla Wiemer
</str>
</doc>
<doc>
<str name="id">22</str>
<str name="title">
China's politics in 2004 : dawn of the Hu Jintao era / Zheng Yongnian & Lye
Liang Fook
</str>
</doc>
<doc>
<str name="id">92</str>
<str name="title">
Goods and Services Tax Act [2005 ed.] (Chapter 117A)
</str>
</doc>
<doc>
<str name="id">13</str>
<str name="title">
Governing capacity in China : creating a contingent of qualified personnel
/ Kjeld Erik Brodsgaard
</str>
</doc>
<doc>
<str name="id">21</str>
<str name="title">Health care marketization in urban China / Gu Xin</str>
</doc>
<doc>
<str name="id">85</str>
<str name="title">Lianhe Zaobao, Sunday</str>
</doc>
<doc>
<str name="id">84</str>
<str name="title">
Singapore : vision of a global city / Jones Lang LaSalle
</str>
</doc>
<doc>
<str name="id">7</str>
<str name="title">
Singapore real estate investment trusts : leveraged value / Tony Darwell
</str>
</doc>
<doc>
<str name="id">96</str>
<str name="title">
Singapore's success : engineering economic growth / Henri Ghesquiere
</str>
</doc>
<doc>
<str name="id">23</str>
<str name="title">
The Chen-Soong meeting : the beginning of inter-party rapprochement in
Taiwan? / Raymond R. Wu
</str>
</doc>
<doc>
<str name="id">17</str>
<str name="title">
The Haw Par saga in the 1970s / project sponsor, Low Kwok Mun; team leader,
Sandy Ho; team members, Audrey Low ... et al
</str>
</doc>
<doc>
<str name="id">78</str>
<str name="title">The New paper on Sunday</str>
</doc>
<doc>
<str name="id">95</str>
<str name="title">
The little Red Dot : reflections by Singapore's diplomats / editors, Tommy
Koh, Chang Li Lin
</str>
</doc>
<doc>
<str name="id">52</str>
<str name="title">
[Press releases and articles on policy changes affecting the Singapore
property market] / compiled by the Information Resource Centre, Monetary
Authority of Singapore
</str>
</doc>
<doc>
<str name="id">dataq</str>
<str name="title">
Simon is testing Solr - This one is in English. Color of the Wind. 我是中国人 ,
БOΛbШ OЙ PYCCKO-KИTAЙCKИЙ CΛOBAPb , Français-Chinois
</str>
</doc>
</result>
</response>
Reply | Threaded
Open this post in threaded view
|

Re: Simple Sort Is Not Working In Solr 4.7?

Alexandre Rafalovitch
Like I mentioned before. You could use string type if you just want
title it is. Or you can use a custom type to normalize the indexed
value, as long as you end up with a single token.

So, if you want to strip leading A/An/The, you can use
KeywordTokenizer, combined with whatever post-processing you need. I
would suggest LowerCase filter and perhaps Regex filter to strip off
those leading articles. You may need to iterate a couple of times on
that specific chain.

The good news is that you can just make a couple of type definitions
with different values/order, reload the index (from Cores screen of
the Web Admin UI) and run some of your sample titles through those
different definitions without having to reindex in the Analysis
screen.

Regards,
  Alex.

----
Sign up for my Solr resources newsletter at http://www.solr-start.com/

On 17 February 2015 at 22:36, Simon Cheng <[hidden email]> wrote:

> Hi Alex,
>
> It's okay after I added in a new field "s_title" in the schema and
> re-indexed.
>
>    <field name="s_title" type="string" indexed="true" stored="false"
> multiValued="false"/>
>    <copyField source="title" dest="s_title"/>
>
> But how can I ignore the articles ("A", "An", "The") in the sorting. As you
> can see from the below example :
Reply | Threaded
Open this post in threaded view
|

Re: Simple Sort Is Not Working In Solr 4.7?

Simon Cheng
Great help and thanks to you, Alex.


On Wed, Feb 18, 2015 at 2:48 PM, Alexandre Rafalovitch <[hidden email]>
wrote:

> Like I mentioned before. You could use string type if you just want
> title it is. Or you can use a custom type to normalize the indexed
> value, as long as you end up with a single token.
>
> So, if you want to strip leading A/An/The, you can use
> KeywordTokenizer, combined with whatever post-processing you need. I
> would suggest LowerCase filter and perhaps Regex filter to strip off
> those leading articles. You may need to iterate a couple of times on
> that specific chain.
>
> The good news is that you can just make a couple of type definitions
> with different values/order, reload the index (from Cores screen of
> the Web Admin UI) and run some of your sample titles through those
> different definitions without having to reindex in the Analysis
> screen.
>
> Regards,
>   Alex.
>
> ----
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
> On 17 February 2015 at 22:36, Simon Cheng <[hidden email]> wrote:
> > Hi Alex,
> >
> > It's okay after I added in a new field "s_title" in the schema and
> > re-indexed.
> >
> >    <field name="s_title" type="string" indexed="true" stored="false"
> > multiValued="false"/>
> >    <copyField source="title" dest="s_title"/>
> >
> > But how can I ignore the articles ("A", "An", "The") in the sorting. As
> you
> > can see from the below example :
>