Navigation/Paging

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Navigation/Paging

Sebastian Riemer
Hi,

In our web app, when displaying result lists from solr,  we've successfully introduced paging via the params 'start' and 'rows' and it's working quite well.

Our navigation in list screens look like this:


<< First   < Prev   1 - 15 of 62181   Next ><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>   Last >><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>

One can navigate to the first page, previous page, next page and last page. All is done via adapting the param "start" accordingly by simply adding the page size.

However, now we want to introduce a similar navigation in our detail views, where only ever one document is displayed. Again, the navigation bar looks like this:

<< First   < Prev   1 - 15 of 62181   Next ><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>   Last >><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>

But now, Prev / Next shall open up the previous / next _document_ instead of the next page. The same goes for First and Last, it shall open the first / last _document_ not the page.

Our first approach to this was to simply add the param "fl=id" so we only get the IDs of documents and set page size to ALL (i.e. no restriction on param "rows"). That way, it was easy to extract the current document id from the result list, and check which id was preceding and succeeding the current id, as well as getting the very first id and the very last id, in order to render the navigation bar.

This lead to solr being heavily under load since it must load 62181 documents (in this example) in order to return the ids. I somehow thought this would be easy for solr to do, but it isn't.

Our second approach was, to simply keep the same value for params "start" and "rows" since the user is always selecting a document from the list - thus the selected document already is within the page. However, the edge cases are, the selected document is the very first on the page or the very last one, thus the previous or next document id is not within the page result from solr -> I guess this we could handle by simply checking and sending a second query where the param "start" would be adjusted accordingly.

However I would not know how to retrieve the id of the very first document and the very last document (except for executing separate queries with I guess start=0, rows=1 and start=62181 and rows=1)

TL,DR:
For any query and a documentId (of which it is known it is within the query result), what is a simple and efficient enough way, to get the following navigational information:

-          Previous document Id

-          Next document id

-          First document id

-          Last document id

Can this sort of requirement be handled within one solr query? Should I user cursorMark in this scenario?

Best regards,

Sebastian

Reply | Threaded
Open this post in threaded view
|

Re: Navigation/Paging

Rick Leir-2
Sebastien
Can you not just handle this in your Javascript? Your request will always get 15 rows, start=0 then start=15 and so on. In the details view you only show one of the documents of course, and when the user is viewing the last of 15 and  clicks next, you will request the next 15.
When viewing the first of the 15, click previous, you will request the previous 15.
Am I missing something here?
Rick

On March 13, 2018 12:26:18 PM EDT, Sebastian Riemer <[hidden email]> wrote:

>Hi,
>
>In our web app, when displaying result lists from solr,  we've
>successfully introduced paging via the params 'start' and 'rows' and
>it's working quite well.
>
>Our navigation in list screens look like this:
>
>
><< First   < Prev   1 - 15 of 62181   Next
>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>
>Last
>>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>
>
>One can navigate to the first page, previous page, next page and last
>page. All is done via adapting the param "start" accordingly by simply
>adding the page size.
>
>However, now we want to introduce a similar navigation in our detail
>views, where only ever one document is displayed. Again, the navigation
>bar looks like this:
>
><< First   < Prev   1 - 15 of 62181   Next
>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>
>Last
>>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>
>
>But now, Prev / Next shall open up the previous / next _document_
>instead of the next page. The same goes for First and Last, it shall
>open the first / last _document_ not the page.
>
>Our first approach to this was to simply add the param "fl=id" so we
>only get the IDs of documents and set page size to ALL (i.e. no
>restriction on param "rows"). That way, it was easy to extract the
>current document id from the result list, and check which id was
>preceding and succeeding the current id, as well as getting the very
>first id and the very last id, in order to render the navigation bar.
>
>This lead to solr being heavily under load since it must load 62181
>documents (in this example) in order to return the ids. I somehow
>thought this would be easy for solr to do, but it isn't.
>
>Our second approach was, to simply keep the same value for params
>"start" and "rows" since the user is always selecting a document from
>the list - thus the selected document already is within the page.
>However, the edge cases are, the selected document is the very first on
>the page or the very last one, thus the previous or next document id is
>not within the page result from solr -> I guess this we could handle by
>simply checking and sending a second query where the param "start"
>would be adjusted accordingly.
>
>However I would not know how to retrieve the id of the very first
>document and the very last document (except for executing separate
>queries with I guess start=0, rows=1 and start=62181 and rows=1)
>
>TL,DR:
>For any query and a documentId (of which it is known it is within the
>query result), what is a simple and efficient enough way, to get the
>following navigational information:
>
>-          Previous document Id
>
>-          Next document id
>
>-          First document id
>
>-          Last document id
>
>Can this sort of requirement be handled within one solr query? Should I
>user cursorMark in this scenario?
>
>Best regards,
>
>Sebastian

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com
Reply | Threaded
Open this post in threaded view
|

Re: Navigation/Paging

Shawn Heisey-2
In reply to this post by Sebastian Riemer
On 3/13/2018 10:26 AM, Sebastian Riemer wrote:
> However, now we want to introduce a similar navigation in our detail views, where only ever one document is displayed. Again, the navigation bar looks like this:
>
> << First   < Prev   1 - 15 of 62181   Next ><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>   Last >><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>
>
> But now, Prev / Next shall open up the previous / next _document_ instead of the next page. The same goes for First and Last, it shall open the first / last _document_ not the page.
>
> Our first approach to this was to simply add the param "fl=id" so we only get the IDs of documents and set page size to ALL (i.e. no restriction on param "rows"). That way, it was easy to extract the current document id from the result list, and check which id was preceding and succeeding the current id, as well as getting the very first id and the very last id, in order to render the navigation bar.
>
> This lead to solr being heavily under load since it must load 62181 documents (in this example) in order to return the ids. I somehow thought this would be easy for solr to do, but it isn't.

This will indeed be very slow.  And you only have 62181 documents in
your result set, which is pretty easy for Solr to handle.  For a search
that has 100 million results, this approach is *impossible*.  I do have
searches like this on my index, and my index is not all that big
compared to some of the indexes that the community has built.

> Our second approach was, to simply keep the same value for params "start" and "rows" since the user is always selecting a document from the list - thus the selected document already is within the page. However, the edge cases are, the selected document is the very first on the page or the very last one, thus the previous or next document id is not within the page result from solr -> I guess this we could handle by simply checking and sending a second query where the param "start" would be adjusted accordingly.

Detail pages often include information that you do not want to store in
Solr.  A well-tuned Solr install will have responses that contain
everything that the application needs to build a search result grid, but
for really detailed information, the application should probably be
using the id information received from Solr to go to the main data
repository and retrieve full details.

Additionally, you should not allow the user to navigate to the last page
or to navigate to the last document, or even a page/document anywhere
near the end of the resultset.  The reason for this is that really high
start values are a serious performance killer.  61K is definitely a
start value high enough to see performance drops.  If the user tries to
page too deeply into results, your application should simply refuse to
go any further.  For comparison purposes -- the last time I checked how
deeply Google would let me go into a search result, I could get to page
39, but no further.  The number of results for my search was MILLIONS,
but Google wouldn't let me view them all.  The performance issues for
deep paging are universal for search engines, especially when it is
possible to jump to an arbitrary page number.

I recommend limiting how many results a user can page through to about
5000 or 10000.  If there are 50 results per page, this allows them to
get to at least page 99.  In general, most users of search engines will
never go deeper than about page 3.  There are some kinds of applications
where a typical user might visit the first few dozen pages ... but
anything deeper is NOT common.  If you have an atypical user, they are
probably prepared for large page numbers to take a lot longer to load. 
The main reason you should be limiting how deep users can go is that
when one user is going thousands of documents into a result set,
performance of the other queries on the system CAN drop dramatically.

> However I would not know how to retrieve the id of the very first document and the very last document (except for executing separate queries with I guess start=0, rows=1 and start=62181 and rows=1)

When you display a page of results, your application already has N
document IDs received from Solr to display a page of results.  Using
that information, you can navigate through the documents one at a time. 
Then if you reach the end of what you have on that page, you can issue
another query for the next page or the previous page.  If you are
restricting how deep a user can go, the performance of this approach
should be pretty good.

> For any query and a documentId (of which it is known it is within the query result), what is a simple and efficient enough way, to get the following navigational information:
>
> -          Previous document Id
>
> -          Next document id
>
> -          First document id
>
> -          Last document id

Having this information available is nearly impossible.

The values for each document will depend on the sort you use.  Change
the sort, and all the values will be wrong.  And if you delete documents
or add documents, those values will likely change, and the values for an
individual document could change several times per second.  Solr cannot
automatically provide this information, and it is pretty much impossible
to have accurate and up to date information if you calculate it at index
time and add it yourself.

Side note:  When sorting by relevance score, which is the default sort
order, changing the query also changes the sort.

----

Note that there *is* a Solr solution for the performance problems of
deep paging ... but cursorMark (the name of the feature) does not
support jumping directly to an arbitrary page number.  If you want page
25000 when using cursorMark, you have to retrieve the first 24999 pages
before you will have the cursor value for page 25000.  But once you HAVE
that value, retrieving page 25000 will be just as fast as page 1, which
is definitely not the case when using start/rows to get pages.

https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Newer versions of Solr also have things like the export handler and
streaming expressions, which are designed to provide REALLY large result
sets without putting major load on the server.  Very large result sets
do still take a lot of TIME, so they're only usable for offline
activities like research and data mining, not live usage in an
application.  But they won't kill the server when they are used.  I do
not know how to use these features, but information is available in the
Solr Reference Guide.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

AW: Navigation/Paging

Sebastian Riemer
In reply to this post by Rick Leir-2
Hi Rick,

thanks for pointing this out - that's the solution I was thinking about too

"... -> I guess this we could handle by
>simply checking and sending a second query where the param "start"
>would be adjusted accordingly ..."

Just checking if there are other options,

Thanks again!

Sebastian

Sebastien
Can you not just handle this in your Javascript? Your request will always get 15 rows, start=0 then start=15 and so on. In the details view you only show one of the documents of course, and when the user is viewing the last of 15 and  clicks next, you will request the next 15.
When viewing the first of the 15, click previous, you will request the previous 15.
Am I missing something here?
Rick

On March 13, 2018 12:26:18 PM EDT, Sebastian Riemer <[hidden email]> wrote:

>Hi,
>
>In our web app, when displaying result lists from solr,  we've
>successfully introduced paging via the params 'start' and 'rows' and
>it's working quite well.
>
>Our navigation in list screens look like this:
>
>
><< First   < Prev   1 - 15 of 62181   Next
>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=e
>>n>
>Last
>>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=
>>>en>
>
>One can navigate to the first page, previous page, next page and last
>page. All is done via adapting the param "start" accordingly by simply
>adding the page size.
>
>However, now we want to introduce a similar navigation in our detail
>views, where only ever one document is displayed. Again, the navigation
>bar looks like this:
>
><< First   < Prev   1 - 15 of 62181   Next
>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=e
>>n>
>Last
>>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=
>>>en>
>
>But now, Prev / Next shall open up the previous / next _document_
>instead of the next page. The same goes for First and Last, it shall
>open the first / last _document_ not the page.
>
>Our first approach to this was to simply add the param "fl=id" so we
>only get the IDs of documents and set page size to ALL (i.e. no
>restriction on param "rows"). That way, it was easy to extract the
>current document id from the result list, and check which id was
>preceding and succeeding the current id, as well as getting the very
>first id and the very last id, in order to render the navigation bar.
>
>This lead to solr being heavily under load since it must load 62181
>documents (in this example) in order to return the ids. I somehow
>thought this would be easy for solr to do, but it isn't.
>
>Our second approach was, to simply keep the same value for params
>"start" and "rows" since the user is always selecting a document from
>the list - thus the selected document already is within the page.
>However, the edge cases are, the selected document is the very first on
>the page or the very last one, thus the previous or next document id is
>not within the page result from solr -> I guess this we could handle by
>simply checking and sending a second query where the param "start"
>would be adjusted accordingly.
>
>However I would not know how to retrieve the id of the very first
>document and the very last document (except for executing separate
>queries with I guess start=0, rows=1 and start=62181 and rows=1)
>
>TL,DR:
>For any query and a documentId (of which it is known it is within the
>query result), what is a simple and efficient enough way, to get the
>following navigational information:
>
>-          Previous document Id
>
>-          Next document id
>
>-          First document id
>
>-          Last document id
>
>Can this sort of requirement be handled within one solr query? Should I
>user cursorMark in this scenario?
>
>Best regards,
>
>Sebastian

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com
Reply | Threaded
Open this post in threaded view
|

AW: Navigation/Paging

Sebastian Riemer
In reply to this post by Shawn Heisey-2
Dear Shawn,

thank you so much for taking the time for this detailed answer! It helps me very much and I'm very grateful.

1) As you've suggested, we already load the data for detail pages from our relational db, just using the documentId from Solr to look it up.
2) Our index size won't ever reach millions of records as it is common in other users' scenarios. Having 60000 Documents as search result is currently the maximum as single client can ever get when not specifying _any_ filter criterias.

-> I'll have to think about whether to prevent the user from deep paging into big search results, or just take a possible performance hit (as you've pointed out, usually a typical user won't page further than a couple of pages).  The same goes for jumping to the very end of a search result. Currently I kind of like this feature so I'll try to keep it in.

For retrieving the previous/next documentId if I'm on the start/end of the current page, I'll use the approach you (and Rick) suggested -thanks!
 
Best wishes,

Sebastian

-----Ursprüngliche Nachricht-----
Von: Shawn Heisey [mailto:[hidden email]]
Gesendet: Mittwoch, 14. März 2018 00:19
An: [hidden email]
Betreff: Re: Navigation/Paging

On 3/13/2018 10:26 AM, Sebastian Riemer wrote:
> However, now we want to introduce a similar navigation in our detail views, where only ever one document is displayed. Again, the navigation bar looks like this:
>
> << First   < Prev   1 - 15 of 62181   Next ><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>   Last >><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>
>
> But now, Prev / Next shall open up the previous / next _document_ instead of the next page. The same goes for First and Last, it shall open the first / last _document_ not the page.
>
> Our first approach to this was to simply add the param "fl=id" so we only get the IDs of documents and set page size to ALL (i.e. no restriction on param "rows"). That way, it was easy to extract the current document id from the result list, and check which id was preceding and succeeding the current id, as well as getting the very first id and the very last id, in order to render the navigation bar.
>
> This lead to solr being heavily under load since it must load 62181 documents (in this example) in order to return the ids. I somehow thought this would be easy for solr to do, but it isn't.

This will indeed be very slow.  And you only have 62181 documents in your result set, which is pretty easy for Solr to handle.  For a search that has 100 million results, this approach is *impossible*.  I do have searches like this on my index, and my index is not all that big compared to some of the indexes that the community has built.

> Our second approach was, to simply keep the same value for params "start" and "rows" since the user is always selecting a document from the list - thus the selected document already is within the page. However, the edge cases are, the selected document is the very first on the page or the very last one, thus the previous or next document id is not within the page result from solr -> I guess this we could handle by simply checking and sending a second query where the param "start" would be adjusted accordingly.

Detail pages often include information that you do not want to store in Solr.  A well-tuned Solr install will have responses that contain everything that the application needs to build a search result grid, but for really detailed information, the application should probably be using the id information received from Solr to go to the main data repository and retrieve full details.

Additionally, you should not allow the user to navigate to the last page or to navigate to the last document, or even a page/document anywhere near the end of the resultset.  The reason for this is that really high start values are a serious performance killer.  61K is definitely a start value high enough to see performance drops.  If the user tries to page too deeply into results, your application should simply refuse to go any further.  For comparison purposes -- the last time I checked how deeply Google would let me go into a search result, I could get to page 39, but no further.  The number of results for my search was MILLIONS, but Google wouldn't let me view them all.  The performance issues for deep paging are universal for search engines, especially when it is possible to jump to an arbitrary page number.

I recommend limiting how many results a user can page through to about
5000 or 10000.  If there are 50 results per page, this allows them to get to at least page 99.  In general, most users of search engines will never go deeper than about page 3.  There are some kinds of applications where a typical user might visit the first few dozen pages ... but anything deeper is NOT common.  If you have an atypical user, they are probably prepared for large page numbers to take a lot longer to load. The main reason you should be limiting how deep users can go is that when one user is going thousands of documents into a result set, performance of the other queries on the system CAN drop dramatically.

> However I would not know how to retrieve the id of the very first
> document and the very last document (except for executing separate
> queries with I guess start=0, rows=1 and start=62181 and rows=1)

When you display a page of results, your application already has N document IDs received from Solr to display a page of results.  Using that information, you can navigate through the documents one at a time. Then if you reach the end of what you have on that page, you can issue another query for the next page or the previous page.  If you are restricting how deep a user can go, the performance of this approach should be pretty good.

> For any query and a documentId (of which it is known it is within the query result), what is a simple and efficient enough way, to get the following navigational information:
>
> -          Previous document Id
>
> -          Next document id
>
> -          First document id
>
> -          Last document id

Having this information available is nearly impossible.

The values for each document will depend on the sort you use.  Change the sort, and all the values will be wrong.  And if you delete documents or add documents, those values will likely change, and the values for an individual document could change several times per second.  Solr cannot automatically provide this information, and it is pretty much impossible to have accurate and up to date information if you calculate it at index time and add it yourself.

Side note:  When sorting by relevance score, which is the default sort order, changing the query also changes the sort.

----

Note that there *is* a Solr solution for the performance problems of deep paging ... but cursorMark (the name of the feature) does not support jumping directly to an arbitrary page number.  If you want page
25000 when using cursorMark, you have to retrieve the first 24999 pages before you will have the cursor value for page 25000.  But once you HAVE that value, retrieving page 25000 will be just as fast as page 1, which is definitely not the case when using start/rows to get pages.

https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Newer versions of Solr also have things like the export handler and streaming expressions, which are designed to provide REALLY large result sets without putting major load on the server.  Very large result sets do still take a lot of TIME, so they're only usable for offline activities like research and data mining, not live usage in an application.  But they won't kill the server when they are used.  I do not know how to use these features, but information is available in the Solr Reference Guide.

Thanks,
Shawn