CursorMarks and 'end of results'

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

CursorMarks and 'end of results'

David Frese
Hi List,

the documentation of 'cursorMarks' recommends to fetch until a query
returns the cursorMark that was passed in to a request.

But that always requires an additional request at the end, so I wonder
if I can stop already, if a request returns less results than requested
(num rows). There won't be new documents added during the search in my
use case, so could there every be a non-empty 'page' after a non-full
'page'?

Thanks very much.

--
David Frese
+49 7071 70896 75

Active Group GmbH
Hechinger Str. 12/1, 72072 Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 224404
Geschäftsführer: Dr. Michael Sperber
Reply | Threaded
Open this post in threaded view
|

Re: CursorMarks and 'end of results'

Anshum Gupta-3
Hi David,

The cursormark would be the same if you get back fewer than the max records requested and so you should exit, as per the documentation.

I think the documentation says just what you are suggesting, but if you think it could be improved, feel free to put up a patch.


 Anshum


On Jun 18, 2018, at 2:09 AM, David Frese <[hidden email]> wrote:

Hi List,

the documentation of 'cursorMarks' recommends to fetch until a query returns the cursorMark that was passed in to a request.

But that always requires an additional request at the end, so I wonder if I can stop already, if a request returns less results than requested (num rows). There won't be new documents added during the search in my use case, so could there every be a non-empty 'page' after a non-full 'page'?

Thanks very much.

--
David Frese
+49 7071 70896 75

Active Group GmbH
Hechinger Str. 12/1, 72072 Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 224404
Geschäftsführer: Dr. Michael Sperber


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CursorMarks and 'end of results'

Anshum Gupta-3
I might have been wrong there. Having an explicit check for the # results returned vs rows requested, would allow you to avoid the last request that would otherwise come back with 0 results. That check isn’t automatically done within Solr.

 Anshum


On Jun 19, 2018, at 2:39 PM, Anshum Gupta <[hidden email]> wrote:

Hi David,

The cursormark would be the same if you get back fewer than the max records requested and so you should exit, as per the documentation.

I think the documentation says just what you are suggesting, but if you think it could be improved, feel free to put up a patch.


 Anshum


On Jun 18, 2018, at 2:09 AM, David Frese <[hidden email]> wrote:

Hi List,

the documentation of 'cursorMarks' recommends to fetch until a query returns the cursorMark that was passed in to a request.

But that always requires an additional request at the end, so I wonder if I can stop already, if a request returns less results than requested (num rows). There won't be new documents added during the search in my use case, so could there every be a non-empty 'page' after a non-full 'page'?

Thanks very much.

--
David Frese
+49 7071 70896 75

Active Group GmbH
Hechinger Str. 12/1, 72072 Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 224404
Geschäftsführer: Dr. Michael Sperber



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CursorMarks and 'end of results'

Chris Hostetter-3
In reply to this post by David Frese

: the documentation of 'cursorMarks' recommends to fetch until a query returns
: the cursorMark that was passed in to a request.
:
: But that always requires an additional request at the end, so I wonder if I
: can stop already, if a request returns less results than requested (num rows).
: There won't be new documents added during the search in my use case, so could
: there every be a non-empty 'page' after a non-full 'page'?

You could stop then -- if that fits your usecase -- but the documentation
(in particular the sentence you are refering to) is trying to be as
straightforward and general as possible ... which includes the use case
where someone is "tailing" an index and documents may be continually
added.

When originally writing those docs, I did have a bit in there about
*either* getting back less then "rows" docs *or* getting back the same
cursor you passed in (to try to cover both use cases as efficiently as
possible) but it seemed more confusing -- and i was worried people might
be suprised/confused when the number of docs was perfectly divisible by
"rows" so the "less then rows" case could still wind up in a final
request that returned "0" docs.

the current docs seemed like a good balance between brevity & clarity,
with the added bonus of being correct :)

But as Anshum said: if you have suggested improvements for rewording,
patches/PRs certainly welcome.  It's hard to have a good perspective on
what docs are helpful to new users whne you have been working with the
software for 14 years and wrote the code in question.



-Hoss
http://www.lucidworks.com/
Reply | Threaded
Open this post in threaded view
|

Re: CursorMarks and 'end of results'

David Frese
Am 22.06.18 um 02:37 schrieb Chris Hostetter:

>
> : the documentation of 'cursorMarks' recommends to fetch until a query returns
> : the cursorMark that was passed in to a request.
> :
> : But that always requires an additional request at the end, so I wonder if I
> : can stop already, if a request returns less results than requested (num rows).
> : There won't be new documents added during the search in my use case, so could
> : there every be a non-empty 'page' after a non-full 'page'?
>
> You could stop then -- if that fits your usecase -- but the documentation
> (in particular the sentence you are refering to) is trying to be as
> straightforward and general as possible ... which includes the use case
> where someone is "tailing" an index and documents may be continually
> added.
>
> When originally writing those docs, I did have a bit in there about
> *either* getting back less then "rows" docs *or* getting back the same
> cursor you passed in (to try to cover both use cases as efficiently as
> possible) but it seemed more confusing -- and i was worried people might
> be suprised/confused when the number of docs was perfectly divisible by
> "rows" so the "less then rows" case could still wind up in a final
> request that returned "0" docs.
>
> the current docs seemed like a good balance between brevity & clarity,
> with the added bonus of being correct :)
>
> But as Anshum said: if you have suggested improvements for rewording,
> patches/PRs certainly welcome.  It's hard to have a good perspective on
> what docs are helpful to new users whne you have been working with the
> software for 14 years and wrote the code in question.

Thank you very much for the clarification.

It basically cuts down the search time in half in the usual case for us,
so it's an important 'feature'.


--
David Frese
+49 7071 70896 75

Active Group GmbH
Hechinger Str. 12/1, 72072 Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 224404
Geschäftsführer: Dr. Michael Sperber
Reply | Threaded
Open this post in threaded view
|

Re: CursorMarks and 'end of results'

Erick Erickson
bq. It basically cuts down the search time in half in the usual case
for us, so it's an important 'feature'.

Wait. You mean that the "extra" call to get back 0 rows doubles your
query time? That's surprising, tell us more.

How many times does your "usual" use case call using CursorMark? My
off-the-cuff explanation would be that
you usually get all the rows in the first call.

CursorMark is intended to help with the "deep paging" problem, i.e.
where start=some_big_number to allow
returning large results sets in chunks, say through 10s of K rows.
Part of our puzzlement is that in that
case the overhead of the last call is minuscule compared to the rest.

There's no reason that it can't be used for small result sets, those
are just usually handled by setting the
start parameter. Up through, say, 1,000 or so the extra overhead is
pretty unnoticeable. So my head was
in the "what's the problem with 1 extra call after making the first 50?".

OTOH, if you make 100 successive calls to search with the CursorMark
and call 101 takes as long as
the previous 100, something's horribly wrong.

Best,
Erick


On Fri, Jun 29, 2018 at 4:01 AM, David Frese
<[hidden email]> wrote:

> Am 22.06.18 um 02:37 schrieb Chris Hostetter:
>>
>>
>> : the documentation of 'cursorMarks' recommends to fetch until a query
>> returns
>> : the cursorMark that was passed in to a request.
>> :
>> : But that always requires an additional request at the end, so I wonder
>> if I
>> : can stop already, if a request returns less results than requested (num
>> rows).
>> : There won't be new documents added during the search in my use case, so
>> could
>> : there every be a non-empty 'page' after a non-full 'page'?
>>
>> You could stop then -- if that fits your usecase -- but the documentation
>> (in particular the sentence you are refering to) is trying to be as
>> straightforward and general as possible ... which includes the use case
>> where someone is "tailing" an index and documents may be continually
>> added.
>>
>> When originally writing those docs, I did have a bit in there about
>> *either* getting back less then "rows" docs *or* getting back the same
>> cursor you passed in (to try to cover both use cases as efficiently as
>> possible) but it seemed more confusing -- and i was worried people might
>> be suprised/confused when the number of docs was perfectly divisible by
>> "rows" so the "less then rows" case could still wind up in a final
>> request that returned "0" docs.
>>
>> the current docs seemed like a good balance between brevity & clarity,
>> with the added bonus of being correct :)
>>
>> But as Anshum said: if you have suggested improvements for rewording,
>> patches/PRs certainly welcome.  It's hard to have a good perspective on
>> what docs are helpful to new users whne you have been working with the
>> software for 14 years and wrote the code in question.
>
>
> Thank you very much for the clarification.
>
> It basically cuts down the search time in half in the usual case for us, so
> it's an important 'feature'.
>
>
> --
> David Frese
> +49 7071 70896 75
>
> Active Group GmbH
> Hechinger Str. 12/1, 72072 Tübingen
> Registergericht: Amtsgericht Stuttgart, HRB 224404
> Geschäftsführer: Dr. Michael Sperber
Reply | Threaded
Open this post in threaded view
|

Re: CursorMarks and 'end of results'

David Frese
Am 29.06.18 um 17:42 schrieb Erick Erickson:

> bq. It basically cuts down the search time in half in the usual case
> for us, so it's an important 'feature'.
>
> Wait. You mean that the "extra" call to get back 0 rows doubles your
> query time? That's surprising, tell us more.
>
> How many times does your "usual" use case call using CursorMark? My
> off-the-cuff explanation would be that
> you usually get all the rows in the first call.
>
> CursorMark is intended to help with the "deep paging" problem, i.e.
> where start=some_big_number to allow
> returning large results sets in chunks, say through 10s of K rows.
> Part of our puzzlement is that in that
> case the overhead of the last call is minuscule compared to the rest.
>
> There's no reason that it can't be used for small result sets, those
> are just usually handled by setting the
> start parameter. Up through, say, 1,000 or so the extra overhead is
> pretty unnoticeable. So my head was
> in the "what's the problem with 1 extra call after making the first 50?".
>
> OTOH, if you make 100 successive calls to search with the CursorMark
> and call 101 takes as long as
> the previous 100, something's horribly wrong.

Hi,

I use it in a server application where I need to process all results in
every case, which can be between 0 and 100's of thousands. We use
pagination to have a boundary on the required memory on "our" side by
processing page-after-page.

Most cases will fit into one page though - a few hundred results. Our
Solr cluster takes about 5 to 10 seconds (*) for the first 'filled' page
_and_ about the _same time_ again for the second empty page. So if I
have the guarantee that the second page is always empty, that helps a lot.

Solr 5.5 that is, btw.

(*) If it could be faster then 5 seconds is a different issue. But the
query is quite complex with a lot of AND/OR and BlockJoins too, and I
have no idea if memory is large enough to hold the indices and things
like that. Not really optimized yet.


David.

--
David Frese
+49 7071 70896 75

Active Group GmbH
Hechinger Str. 12/1, 72072 Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 224404
Geschäftsführer: Dr. Michael Sperber
Reply | Threaded
Open this post in threaded view
|

Re: CursorMarks and 'end of results'

Erick Erickson
OK, that makes sense then.

I don't think we've mentioned streaming as an alternative. It has some
restrictions (it can only export docValues), and frankly I don't
really remember how much of it was in 5.5 so you'll have to check.

Streaming is designed exactly to, well, stream the entire result set
out. There's some setup cost, so your use case where most cases have
not have all that many hits the setup may be too onerous but I thought
I'd mention it.

Best,
Erick

On Mon, Jul 2, 2018 at 5:14 AM, David Frese <[hidden email]> wrote:

> Am 29.06.18 um 17:42 schrieb Erick Erickson:
>>
>> bq. It basically cuts down the search time in half in the usual case
>> for us, so it's an important 'feature'.
>>
>> Wait. You mean that the "extra" call to get back 0 rows doubles your
>> query time? That's surprising, tell us more.
>>
>> How many times does your "usual" use case call using CursorMark? My
>> off-the-cuff explanation would be that
>> you usually get all the rows in the first call.
>>
>> CursorMark is intended to help with the "deep paging" problem, i.e.
>> where start=some_big_number to allow
>> returning large results sets in chunks, say through 10s of K rows.
>> Part of our puzzlement is that in that
>> case the overhead of the last call is minuscule compared to the rest.
>>
>> There's no reason that it can't be used for small result sets, those
>> are just usually handled by setting the
>> start parameter. Up through, say, 1,000 or so the extra overhead is
>> pretty unnoticeable. So my head was
>> in the "what's the problem with 1 extra call after making the first 50?".
>>
>> OTOH, if you make 100 successive calls to search with the CursorMark
>> and call 101 takes as long as
>> the previous 100, something's horribly wrong.
>
>
> Hi,
>
> I use it in a server application where I need to process all results in
> every case, which can be between 0 and 100's of thousands. We use pagination
> to have a boundary on the required memory on "our" side by processing
> page-after-page.
>
> Most cases will fit into one page though - a few hundred results. Our Solr
> cluster takes about 5 to 10 seconds (*) for the first 'filled' page _and_
> about the _same time_ again for the second empty page. So if I have the
> guarantee that the second page is always empty, that helps a lot.
>
> Solr 5.5 that is, btw.
>
> (*) If it could be faster then 5 seconds is a different issue. But the query
> is quite complex with a lot of AND/OR and BlockJoins too, and I have no idea
> if memory is large enough to hold the indices and things like that. Not
> really optimized yet.
>
>
> David.
>
> --
> David Frese
> +49 7071 70896 75
>
> Active Group GmbH
> Hechinger Str. 12/1, 72072 Tübingen
> Registergericht: Amtsgericht Stuttgart, HRB 224404
> Geschäftsführer: Dr. Michael Sperber