Cursor Performance Issue

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Cursor Performance Issue

Ajay Sharma-3
Hi All,

I have used cursors to search and export documents in solr according to
https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors

Solr version: 6.5.0
No of Documents: 10 crore

Before implementing cursor, I was using the start and rows parameter to
fetch records
Service response time used to be 2 sec

*Before implementing Cursor Solr URL:*
http://localhost:8080/solr/search/select?q=bird
toy&qt=mapping&ps=3&rows=25&mm=100

Request handler Looks like this: fl contains approx 20 fields
<requestHandler name="mapping" class="solr.SearchHandler">
    <lst name="invariants">
        <str name="defType">edismax</str>
        <str name="indent">on</str>
        <float name="tie">0.01</float>
    </lst>
    <lst name="appends">
        <str name="fl">id,refid,title,smalldesc:""</str>
    </lst>
   <lst name="defaults">
        <str name="echoParams">none</str>
        <str name="wt">json</str>
        <int name="rows">25</int>
        <str name="timeAllowed">15000</str>
        <str name="qf">smalldesc</str>
        <str name="qf">title_text</str>
        <str name="qf">titlews^3</str>
        <str name="qf">sdescnisq</str>
        <str name="qs">1</str>
        <!-- retrive following fields -->
        <str name="mm">2&lt;-1 4&lt;70%</str>
    </lst>
</requestHandler>

Sharing Response with EchoParams=all > Qtime is 6
responseHeader: {
status: 0,
QTime: 6,
params: {
    ps: "3",
    echoParams: "all",
    indent: "on",
    fl: "id,refid,title,smalldesc:"",
    tie: "0.01",
    defType: "edismax",
    qf: "customphonetic",
    wt: "json",
   qs: "1",
   qt: "mapping",
   rows: "25",
   q: "bird toy",
   timeAllowed: "15000"
}
},
response: {
numFound: 17,
start: 0,
maxScore: 26.616478,
docs: [
  {
    id: "22347708097",
    refid: "152585558",
    title: "Round BIRD COLOURFUL SWINGING CIRCULAR SITTING TOY",
    smalldesc: "",
    score: 26.616478
 }
]
}

I am facing a performance issue now after implementing the cursor. Service
response time is increased 3 to 4 times .i.e. 8 sec in some cases

*After implementing Cursor query is-*
localhost:8080/solr/search/select?q=bird
toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=*

Just added &sort=score desc,id asc&cursorMark=* to the before query and
rows to be fetched is 1000 now and fl contains just a single field

Request handler remains same as before just changed the name and made fl
change and added df in defaults

<requestHandler name="cursor" class="solr.SearchHandler">
   <lst name="invariants">
      <str name="defType">edismax</str>
      <str name="indent">on</str>
      <float name="tie">0.01</float>
   </lst>
   <lst name="appends">
      <str name="fl">refid</str>
   </lst>
   <lst name="defaults">
      <str name="echoParams">none</str>
      <str name="wt">json</str>
      <int name="rows">1000</int>
      <str name="qf">smalldesc</str>
      <str name="qf">title_text</str>
      <str name="qf">titlews^3</str>
      <str name="qf">sdescnisq</str>
      <str name="qs">1</str>
      <str name="mm">2&lt;-1 4&lt;70%</str>
      <str name="df">product_titles</str>
   </lst>
</requestHandler>

Response with Cursor and echoParams=all-> *Qtime is now 17* i.e approx 3
time of previous qtime
responseHeader: {
status: 0,
QTime: 17,
params: {
df: "product_titles",
ps: "3",
echoParams: "all",
indent: "on",
fl: "refid",
tie: "0.01",
defType: "edismax",
qf: "customphonetic",
qs: "1",
qt: "cursor",
sort: "score desc,id asc",
rows: "1000",
q: "bird toy",
cursorMark: "*",
}
},
response: {
numFound: 17,
start: 0,
docs: [
{
refid: "152585558"
},
{
refid: "157276077"
}
]
}


When i curl http://localhost:8080/solr/search/select?q=bird
toy&qt=mapping&ps=3&rows=25&mm=100, i can get results in 3 seconds.
When i curl localhost:8080/solr/search/select?q=bird
toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=* it
consumed 8 seconds to return result even if the result count=0

BTW, the id schema definition is used in sort
<field name="id" type="string" indexed="true" stored="true" required="true"
omitNorms="true" multiValued="false"/>

Is it due to the sort I have applied or I have implemented it in the wrong
way?
Please help or provide the direction to solve this issue


Thanks in advance

--
Thanks & Regards,
Ajay Sharma
Product Search
Indiamart Intermesh Ltd.

--

Reply | Threaded
Open this post in threaded view
|

Re: Cursor Performance Issue

Mike Drob-2
You should be using docvalues on your id, but note that switching this
would require a reindex.

On Wed, Jan 13, 2021 at 6:04 AM Ajay Sharma <[hidden email]>
wrote:

> Hi All,
>
> I have used cursors to search and export documents in solr according to
>
> https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
>
> Solr version: 6.5.0
> No of Documents: 10 crore
>
> Before implementing cursor, I was using the start and rows parameter to
> fetch records
> Service response time used to be 2 sec
>
> *Before implementing Cursor Solr URL:*
> http://localhost:8080/solr/search/select?q=bird
> toy&qt=mapping&ps=3&rows=25&mm=100
>
> Request handler Looks like this: fl contains approx 20 fields
> <requestHandler name="mapping" class="solr.SearchHandler">
>     <lst name="invariants">
>         <str name="defType">edismax</str>
>         <str name="indent">on</str>
>         <float name="tie">0.01</float>
>     </lst>
>     <lst name="appends">
>         <str name="fl">id,refid,title,smalldesc:""</str>
>     </lst>
>    <lst name="defaults">
>         <str name="echoParams">none</str>
>         <str name="wt">json</str>
>         <int name="rows">25</int>
>         <str name="timeAllowed">15000</str>
>         <str name="qf">smalldesc</str>
>         <str name="qf">title_text</str>
>         <str name="qf">titlews^3</str>
>         <str name="qf">sdescnisq</str>
>         <str name="qs">1</str>
>         <!-- retrive following fields -->
>         <str name="mm">2&lt;-1 4&lt;70%</str>
>     </lst>
> </requestHandler>
>
> Sharing Response with EchoParams=all > Qtime is 6
> responseHeader: {
> status: 0,
> QTime: 6,
> params: {
>     ps: "3",
>     echoParams: "all",
>     indent: "on",
>     fl: "id,refid,title,smalldesc:"",
>     tie: "0.01",
>     defType: "edismax",
>     qf: "customphonetic",
>     wt: "json",
>    qs: "1",
>    qt: "mapping",
>    rows: "25",
>    q: "bird toy",
>    timeAllowed: "15000"
> }
> },
> response: {
> numFound: 17,
> start: 0,
> maxScore: 26.616478,
> docs: [
>   {
>     id: "22347708097",
>     refid: "152585558",
>     title: "Round BIRD COLOURFUL SWINGING CIRCULAR SITTING TOY",
>     smalldesc: "",
>     score: 26.616478
>  }
> ]
> }
>
> I am facing a performance issue now after implementing the cursor. Service
> response time is increased 3 to 4 times .i.e. 8 sec in some cases
>
> *After implementing Cursor query is-*
> localhost:8080/solr/search/select?q=bird
> toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=*
>
> Just added &sort=score desc,id asc&cursorMark=* to the before query and
> rows to be fetched is 1000 now and fl contains just a single field
>
> Request handler remains same as before just changed the name and made fl
> change and added df in defaults
>
> <requestHandler name="cursor" class="solr.SearchHandler">
>    <lst name="invariants">
>       <str name="defType">edismax</str>
>       <str name="indent">on</str>
>       <float name="tie">0.01</float>
>    </lst>
>    <lst name="appends">
>       <str name="fl">refid</str>
>    </lst>
>    <lst name="defaults">
>       <str name="echoParams">none</str>
>       <str name="wt">json</str>
>       <int name="rows">1000</int>
>       <str name="qf">smalldesc</str>
>       <str name="qf">title_text</str>
>       <str name="qf">titlews^3</str>
>       <str name="qf">sdescnisq</str>
>       <str name="qs">1</str>
>       <str name="mm">2&lt;-1 4&lt;70%</str>
>       <str name="df">product_titles</str>
>    </lst>
> </requestHandler>
>
> Response with Cursor and echoParams=all-> *Qtime is now 17* i.e approx 3
> time of previous qtime
> responseHeader: {
> status: 0,
> QTime: 17,
> params: {
> df: "product_titles",
> ps: "3",
> echoParams: "all",
> indent: "on",
> fl: "refid",
> tie: "0.01",
> defType: "edismax",
> qf: "customphonetic",
> qs: "1",
> qt: "cursor",
> sort: "score desc,id asc",
> rows: "1000",
> q: "bird toy",
> cursorMark: "*",
> }
> },
> response: {
> numFound: 17,
> start: 0,
> docs: [
> {
> refid: "152585558"
> },
> {
> refid: "157276077"
> }
> ]
> }
>
>
> When i curl http://localhost:8080/solr/search/select?q=bird
> toy&qt=mapping&ps=3&rows=25&mm=100, i can get results in 3 seconds.
> When i curl localhost:8080/solr/search/select?q=bird
> toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=* it
> consumed 8 seconds to return result even if the result count=0
>
> BTW, the id schema definition is used in sort
> <field name="id" type="string" indexed="true" stored="true" required="true"
> omitNorms="true" multiValued="false"/>
>
> Is it due to the sort I have applied or I have implemented it in the wrong
> way?
> Please help or provide the direction to solve this issue
>
>
> Thanks in advance
>
> --
> Thanks & Regards,
> Ajay Sharma
> Product Search
> Indiamart Intermesh Ltd.
>
> --
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Cursor Performance Issue

Ajay Sharma-3
Hi Mike,

Thanks for your reply.

I remember DocValues is enabled by default since solr 6.

If it is not and I reindex the data with DocValues= true for id field. How
much my index size will increase due to this.
Currently I have 90 GB as index size


On Wed, 13 Jan, 2021, 9:14 pm Mike Drob, <[hidden email]> wrote:

> You should be using docvalues on your id, but note that switching this
> would require a reindex.
>
> On Wed, Jan 13, 2021 at 6:04 AM Ajay Sharma <[hidden email]>
> wrote:
>
> > Hi All,
> >
> > I have used cursors to search and export documents in solr according to
> >
> >
> https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
> >
> > Solr version: 6.5.0
> > No of Documents: 10 crore
> >
> > Before implementing cursor, I was using the start and rows parameter to
> > fetch records
> > Service response time used to be 2 sec
> >
> > *Before implementing Cursor Solr URL:*
> > http://localhost:8080/solr/search/select?q=bird
> > toy&qt=mapping&ps=3&rows=25&mm=100
> >
> > Request handler Looks like this: fl contains approx 20 fields
> > <requestHandler name="mapping" class="solr.SearchHandler">
> >     <lst name="invariants">
> >         <str name="defType">edismax</str>
> >         <str name="indent">on</str>
> >         <float name="tie">0.01</float>
> >     </lst>
> >     <lst name="appends">
> >         <str name="fl">id,refid,title,smalldesc:""</str>
> >     </lst>
> >    <lst name="defaults">
> >         <str name="echoParams">none</str>
> >         <str name="wt">json</str>
> >         <int name="rows">25</int>
> >         <str name="timeAllowed">15000</str>
> >         <str name="qf">smalldesc</str>
> >         <str name="qf">title_text</str>
> >         <str name="qf">titlews^3</str>
> >         <str name="qf">sdescnisq</str>
> >         <str name="qs">1</str>
> >         <!-- retrive following fields -->
> >         <str name="mm">2&lt;-1 4&lt;70%</str>
> >     </lst>
> > </requestHandler>
> >
> > Sharing Response with EchoParams=all > Qtime is 6
> > responseHeader: {
> > status: 0,
> > QTime: 6,
> > params: {
> >     ps: "3",
> >     echoParams: "all",
> >     indent: "on",
> >     fl: "id,refid,title,smalldesc:"",
> >     tie: "0.01",
> >     defType: "edismax",
> >     qf: "customphonetic",
> >     wt: "json",
> >    qs: "1",
> >    qt: "mapping",
> >    rows: "25",
> >    q: "bird toy",
> >    timeAllowed: "15000"
> > }
> > },
> > response: {
> > numFound: 17,
> > start: 0,
> > maxScore: 26.616478,
> > docs: [
> >   {
> >     id: "22347708097",
> >     refid: "152585558",
> >     title: "Round BIRD COLOURFUL SWINGING CIRCULAR SITTING TOY",
> >     smalldesc: "",
> >     score: 26.616478
> >  }
> > ]
> > }
> >
> > I am facing a performance issue now after implementing the cursor.
> Service
> > response time is increased 3 to 4 times .i.e. 8 sec in some cases
> >
> > *After implementing Cursor query is-*
> > localhost:8080/solr/search/select?q=bird
> > toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=*
> >
> > Just added &sort=score desc,id asc&cursorMark=* to the before query and
> > rows to be fetched is 1000 now and fl contains just a single field
> >
> > Request handler remains same as before just changed the name and made fl
> > change and added df in defaults
> >
> > <requestHandler name="cursor" class="solr.SearchHandler">
> >    <lst name="invariants">
> >       <str name="defType">edismax</str>
> >       <str name="indent">on</str>
> >       <float name="tie">0.01</float>
> >    </lst>
> >    <lst name="appends">
> >       <str name="fl">refid</str>
> >    </lst>
> >    <lst name="defaults">
> >       <str name="echoParams">none</str>
> >       <str name="wt">json</str>
> >       <int name="rows">1000</int>
> >       <str name="qf">smalldesc</str>
> >       <str name="qf">title_text</str>
> >       <str name="qf">titlews^3</str>
> >       <str name="qf">sdescnisq</str>
> >       <str name="qs">1</str>
> >       <str name="mm">2&lt;-1 4&lt;70%</str>
> >       <str name="df">product_titles</str>
> >    </lst>
> > </requestHandler>
> >
> > Response with Cursor and echoParams=all-> *Qtime is now 17* i.e approx 3
> > time of previous qtime
> > responseHeader: {
> > status: 0,
> > QTime: 17,
> > params: {
> > df: "product_titles",
> > ps: "3",
> > echoParams: "all",
> > indent: "on",
> > fl: "refid",
> > tie: "0.01",
> > defType: "edismax",
> > qf: "customphonetic",
> > qs: "1",
> > qt: "cursor",
> > sort: "score desc,id asc",
> > rows: "1000",
> > q: "bird toy",
> > cursorMark: "*",
> > }
> > },
> > response: {
> > numFound: 17,
> > start: 0,
> > docs: [
> > {
> > refid: "152585558"
> > },
> > {
> > refid: "157276077"
> > }
> > ]
> > }
> >
> >
> > When i curl http://localhost:8080/solr/search/select?q=bird
> > toy&qt=mapping&ps=3&rows=25&mm=100, i can get results in 3 seconds.
> > When i curl localhost:8080/solr/search/select?q=bird
> > toy&qt=cursor&ps=3&rows=1000&mm=100&sort=score desc,id asc&cursorMark=*
> it
> > consumed 8 seconds to return result even if the result count=0
> >
> > BTW, the id schema definition is used in sort
> > <field name="id" type="string" indexed="true" stored="true"
> required="true"
> > omitNorms="true" multiValued="false"/>
> >
> > Is it due to the sort I have applied or I have implemented it in the
> wrong
> > way?
> > Please help or provide the direction to solve this issue
> >
> >
> > Thanks in advance
> >
> > --
> > Thanks & Regards,
> > Ajay Sharma
> > Product Search
> > Indiamart Intermesh Ltd.
> >
> > --
> >
> >
>

--