Solrcloud export all results sorted by score

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Solrcloud export all results sorted by score

Edward Turner
Hi all,

As far as I understand, SolrCloud currently does not allow the use of
sorting by the pseudofield, score in the /export request handler (i.e., get
the results in relevancy order). If we do attempt this, we get an
exception, "org.apache.solr.search.SyntaxError: Scoring is not currently
supported with xsort". We could use Solr's cursorMark, but this takes a
very long time ...

Exporting results does work, however, when exporting result sets by a
specific document field that has docValues set to true.

Question:
Does anyone know if/when it will be possible to sort by score in the
/export handler?

Research on the problem:
We've seen https://issues.apache.org/jira/browse/SOLR-5244 and
https://issues.apache.org/jira/browse/SOLR-8664, which are related to this
issue, but don't fix it. Maybe I've missed a more relevant issue?

Our use-case We are using Solrcloud in our team and it's added a huge
amount of value to our users.

We show a table of search results ordered by score (relevancy) that was
obtained from sending a query to the standard /select handler. We're
working in the life-sciences domain and it is common for our result sets to
contain many millions of results (unfortunately). After users browse their
results, they then may want to download the results that they see, to do
some post-processing. However, to do this, such that the results appear in
the order that the user originally saw them, we'd need to be able to export
results based on score/relevancy.

Any suggestions or advice on this would be greatly appreciated!

Many thanks!

Edd

PS. apologies for posting also on Stackoverflow (
https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score)
--
I only discovered the Solr mailing-list afterwards and thought it probably
better to reach out directly to Solr's people (I can share any answer from
this forum on there retrospectively).
Reply | Threaded
Open this post in threaded view
|

Re: Solrcloud export all results sorted by score

Erick Erickson
First, thanks for taking the time to ask a question with enough supporting details that I can hope to be able to answer in one exchange ;). It’s a pleasure to see.

Second, NP with asking on Stack Overflow, they have some excellent answers there. But you’re right, this list gets more Solr-centered eyeballs.

On to your question. I think the best answer was that “/export wasn’t designed to deal with scores”, which you’ll find disappointing.

You could use the Streaming “search” expression (using qt=/select or just leave qt out) but that’ll sort all of the docs you’re exporting into a huge list, which may perform worse than CursorMark even if it doesn’t blow up memory.

The root of this problem is that export can sort in batches since the values it’s sorting on are contained in each document, so it can iterate in batches, send them out, then iterate again on the remaining documents.

Score, since it’s dynamic, can’t do that. Solr has to score _all_ the docs to know where a doc lands in the final set relative to any other doc, so if it were going to work it’d have to have enough memory to hold the scores of all the docs in an ordered list, which is very expensive. Conceptually this is an ordered list up to maxDoc long. Not only does there have to be enough memory to hold the entire list, every doc has to be inserted individually which can kill performance. This is the “deep paging” problem.

In the usual case of returning, say, 20 docs, the sorted list only has to be 20 long, higher scoring docs evict lower scoring docs.

So I think CursorMark is your best bet.

Best,
Erick

> On Oct 1, 2019, at 3:59 AM, Edward Turner <[hidden email]> wrote:
>
> Hi all,
>
> As far as I understand, SolrCloud currently does not allow the use of
> sorting by the pseudofield, score in the /export request handler (i.e., get
> the results in relevancy order). If we do attempt this, we get an
> exception, "org.apache.solr.search.SyntaxError: Scoring is not currently
> supported with xsort". We could use Solr's cursorMark, but this takes a
> very long time ...
>
> Exporting results does work, however, when exporting result sets by a
> specific document field that has docValues set to true.
>
> Question:
> Does anyone know if/when it will be possible to sort by score in the
> /export handler?
>
> Research on the problem:
> We've seen https://issues.apache.org/jira/browse/SOLR-5244 and
> https://issues.apache.org/jira/browse/SOLR-8664, which are related to this
> issue, but don't fix it. Maybe I've missed a more relevant issue?
>
> Our use-case We are using Solrcloud in our team and it's added a huge
> amount of value to our users.
>
> We show a table of search results ordered by score (relevancy) that was
> obtained from sending a query to the standard /select handler. We're
> working in the life-sciences domain and it is common for our result sets to
> contain many millions of results (unfortunately). After users browse their
> results, they then may want to download the results that they see, to do
> some post-processing. However, to do this, such that the results appear in
> the order that the user originally saw them, we'd need to be able to export
> results based on score/relevancy.
>
> Any suggestions or advice on this would be greatly appreciated!
>
> Many thanks!
>
> Edd
>
> PS. apologies for posting also on Stackoverflow (
> https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score)
> --
> I only discovered the Solr mailing-list afterwards and thought it probably
> better to reach out directly to Solr's people (I can share any answer from
> this forum on there retrospectively).

Reply | Threaded
Open this post in threaded view
|

Re: Solrcloud export all results sorted by score

Walter Underwood
I had to do this recently on a Solr Cloud cluster. I wanted to export all the IDs, but they weren’t stored as docvalues.

The fastest approach was to fetch all the IDs in one request. First, I make a request for zero rows to get the numFound. Then I fetch numFound+1000 (in case docs were added while I wasn’t looking) in one request.

I also have a hairy shell script to do /export on each leader after parsing cluster status. That might be a little large to post to this list, but I can do it if there is general interest.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Oct 1, 2019, at 9:14 AM, Erick Erickson <[hidden email]> wrote:
>
> First, thanks for taking the time to ask a question with enough supporting details that I can hope to be able to answer in one exchange ;). It’s a pleasure to see.
>
> Second, NP with asking on Stack Overflow, they have some excellent answers there. But you’re right, this list gets more Solr-centered eyeballs.
>
> On to your question. I think the best answer was that “/export wasn’t designed to deal with scores”, which you’ll find disappointing.
>
> You could use the Streaming “search” expression (using qt=/select or just leave qt out) but that’ll sort all of the docs you’re exporting into a huge list, which may perform worse than CursorMark even if it doesn’t blow up memory.
>
> The root of this problem is that export can sort in batches since the values it’s sorting on are contained in each document, so it can iterate in batches, send them out, then iterate again on the remaining documents.
>
> Score, since it’s dynamic, can’t do that. Solr has to score _all_ the docs to know where a doc lands in the final set relative to any other doc, so if it were going to work it’d have to have enough memory to hold the scores of all the docs in an ordered list, which is very expensive. Conceptually this is an ordered list up to maxDoc long. Not only does there have to be enough memory to hold the entire list, every doc has to be inserted individually which can kill performance. This is the “deep paging” problem.
>
> In the usual case of returning, say, 20 docs, the sorted list only has to be 20 long, higher scoring docs evict lower scoring docs.
>
> So I think CursorMark is your best bet.
>
> Best,
> Erick
>
>> On Oct 1, 2019, at 3:59 AM, Edward Turner <[hidden email]> wrote:
>>
>> Hi all,
>>
>> As far as I understand, SolrCloud currently does not allow the use of
>> sorting by the pseudofield, score in the /export request handler (i.e., get
>> the results in relevancy order). If we do attempt this, we get an
>> exception, "org.apache.solr.search.SyntaxError: Scoring is not currently
>> supported with xsort". We could use Solr's cursorMark, but this takes a
>> very long time ...
>>
>> Exporting results does work, however, when exporting result sets by a
>> specific document field that has docValues set to true.
>>
>> Question:
>> Does anyone know if/when it will be possible to sort by score in the
>> /export handler?
>>
>> Research on the problem:
>> We've seen https://issues.apache.org/jira/browse/SOLR-5244 and
>> https://issues.apache.org/jira/browse/SOLR-8664, which are related to this
>> issue, but don't fix it. Maybe I've missed a more relevant issue?
>>
>> Our use-case We are using Solrcloud in our team and it's added a huge
>> amount of value to our users.
>>
>> We show a table of search results ordered by score (relevancy) that was
>> obtained from sending a query to the standard /select handler. We're
>> working in the life-sciences domain and it is common for our result sets to
>> contain many millions of results (unfortunately). After users browse their
>> results, they then may want to download the results that they see, to do
>> some post-processing. However, to do this, such that the results appear in
>> the order that the user originally saw them, we'd need to be able to export
>> results based on score/relevancy.
>>
>> Any suggestions or advice on this would be greatly appreciated!
>>
>> Many thanks!
>>
>> Edd
>>
>> PS. apologies for posting also on Stackoverflow (
>> https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score)
>> --
>> I only discovered the Solr mailing-list afterwards and thought it probably
>> better to reach out directly to Solr's people (I can share any answer from
>> this forum on there retrospectively).
>

Reply | Threaded
Open this post in threaded view
|

Re: Solrcloud export all results sorted by score

Edward Turner
In reply to this post by Erick Erickson
Hi Erick,

Many thanks for your detailed reply. It's really good information for us to
know, and although not exactly what we wanted to hear (that /export wasn't
designed to handle ranking), it's much better for us to definitively know
one way or the other -- and this allows us to move forward. We'll
experiment by going the cursorMark route. I'm hoping that the bottleneck
then isn't Solr, but rather the fetching and writing of the full records
(we use Solr as just a search engine, which gives us IDs of records of
interest; and we use a separate key-value store to get the actual record
data). Anyway, we'll see and fingers crossed :).

Best wishes,

Edd



On Tue, 1 Oct 2019 at 17:15, Erick Erickson <[hidden email]> wrote:

> First, thanks for taking the time to ask a question with enough supporting
> details that I can hope to be able to answer in one exchange ;). It’s a
> pleasure to see.
>
> Second, NP with asking on Stack Overflow, they have some excellent answers
> there. But you’re right, this list gets more Solr-centered eyeballs.
>
> On to your question. I think the best answer was that “/export wasn’t
> designed to deal with scores”, which you’ll find disappointing.
>
> You could use the Streaming “search” expression (using qt=/select or just
> leave qt out) but that’ll sort all of the docs you’re exporting into a huge
> list, which may perform worse than CursorMark even if it doesn’t blow up
> memory.
>
> The root of this problem is that export can sort in batches since the
> values it’s sorting on are contained in each document, so it can iterate in
> batches, send them out, then iterate again on the remaining documents.
>
> Score, since it’s dynamic, can’t do that. Solr has to score _all_ the docs
> to know where a doc lands in the final set relative to any other doc, so if
> it were going to work it’d have to have enough memory to hold the scores of
> all the docs in an ordered list, which is very expensive. Conceptually this
> is an ordered list up to maxDoc long. Not only does there have to be enough
> memory to hold the entire list, every doc has to be inserted individually
> which can kill performance. This is the “deep paging” problem.
>
> In the usual case of returning, say, 20 docs, the sorted list only has to
> be 20 long, higher scoring docs evict lower scoring docs.
>
> So I think CursorMark is your best bet.
>
> Best,
> Erick
>
> > On Oct 1, 2019, at 3:59 AM, Edward Turner <[hidden email]> wrote:
> >
> > Hi all,
> >
> > As far as I understand, SolrCloud currently does not allow the use of
> > sorting by the pseudofield, score in the /export request handler (i.e.,
> get
> > the results in relevancy order). If we do attempt this, we get an
> > exception, "org.apache.solr.search.SyntaxError: Scoring is not currently
> > supported with xsort". We could use Solr's cursorMark, but this takes a
> > very long time ...
> >
> > Exporting results does work, however, when exporting result sets by a
> > specific document field that has docValues set to true.
> >
> > Question:
> > Does anyone know if/when it will be possible to sort by score in the
> > /export handler?
> >
> > Research on the problem:
> > We've seen https://issues.apache.org/jira/browse/SOLR-5244 and
> > https://issues.apache.org/jira/browse/SOLR-8664, which are related to
> this
> > issue, but don't fix it. Maybe I've missed a more relevant issue?
> >
> > Our use-case We are using Solrcloud in our team and it's added a huge
> > amount of value to our users.
> >
> > We show a table of search results ordered by score (relevancy) that was
> > obtained from sending a query to the standard /select handler. We're
> > working in the life-sciences domain and it is common for our result sets
> to
> > contain many millions of results (unfortunately). After users browse
> their
> > results, they then may want to download the results that they see, to do
> > some post-processing. However, to do this, such that the results appear
> in
> > the order that the user originally saw them, we'd need to be able to
> export
> > results based on score/relevancy.
> >
> > Any suggestions or advice on this would be greatly appreciated!
> >
> > Many thanks!
> >
> > Edd
> >
> > PS. apologies for posting also on Stackoverflow (
> >
> https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score
> )
> > --
> > I only discovered the Solr mailing-list afterwards and thought it
> probably
> > better to reach out directly to Solr's people (I can share any answer
> from
> > this forum on there retrospectively).
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solrcloud export all results sorted by score

Jörn Franke
Maybe you can sort later using Spark or similar. For that you don’t need a full blown cluster - it runs also on localhost.

> Am 03.10.2019 um 09:49 schrieb Edward Turner <[hidden email]>:
>
> Hi Erick,
>
> Many thanks for your detailed reply. It's really good information for us to
> know, and although not exactly what we wanted to hear (that /export wasn't
> designed to handle ranking), it's much better for us to definitively know
> one way or the other -- and this allows us to move forward. We'll
> experiment by going the cursorMark route. I'm hoping that the bottleneck
> then isn't Solr, but rather the fetching and writing of the full records
> (we use Solr as just a search engine, which gives us IDs of records of
> interest; and we use a separate key-value store to get the actual record
> data). Anyway, we'll see and fingers crossed :).
>
> Best wishes,
>
> Edd
>
>
>
>> On Tue, 1 Oct 2019 at 17:15, Erick Erickson <[hidden email]> wrote:
>>
>> First, thanks for taking the time to ask a question with enough supporting
>> details that I can hope to be able to answer in one exchange ;). It’s a
>> pleasure to see.
>>
>> Second, NP with asking on Stack Overflow, they have some excellent answers
>> there. But you’re right, this list gets more Solr-centered eyeballs.
>>
>> On to your question. I think the best answer was that “/export wasn’t
>> designed to deal with scores”, which you’ll find disappointing.
>>
>> You could use the Streaming “search” expression (using qt=/select or just
>> leave qt out) but that’ll sort all of the docs you’re exporting into a huge
>> list, which may perform worse than CursorMark even if it doesn’t blow up
>> memory.
>>
>> The root of this problem is that export can sort in batches since the
>> values it’s sorting on are contained in each document, so it can iterate in
>> batches, send them out, then iterate again on the remaining documents.
>>
>> Score, since it’s dynamic, can’t do that. Solr has to score _all_ the docs
>> to know where a doc lands in the final set relative to any other doc, so if
>> it were going to work it’d have to have enough memory to hold the scores of
>> all the docs in an ordered list, which is very expensive. Conceptually this
>> is an ordered list up to maxDoc long. Not only does there have to be enough
>> memory to hold the entire list, every doc has to be inserted individually
>> which can kill performance. This is the “deep paging” problem.
>>
>> In the usual case of returning, say, 20 docs, the sorted list only has to
>> be 20 long, higher scoring docs evict lower scoring docs.
>>
>> So I think CursorMark is your best bet.
>>
>> Best,
>> Erick
>>
>>>> On Oct 1, 2019, at 3:59 AM, Edward Turner <[hidden email]> wrote:
>>>
>>> Hi all,
>>>
>>> As far as I understand, SolrCloud currently does not allow the use of
>>> sorting by the pseudofield, score in the /export request handler (i.e.,
>> get
>>> the results in relevancy order). If we do attempt this, we get an
>>> exception, "org.apache.solr.search.SyntaxError: Scoring is not currently
>>> supported with xsort". We could use Solr's cursorMark, but this takes a
>>> very long time ...
>>>
>>> Exporting results does work, however, when exporting result sets by a
>>> specific document field that has docValues set to true.
>>>
>>> Question:
>>> Does anyone know if/when it will be possible to sort by score in the
>>> /export handler?
>>>
>>> Research on the problem:
>>> We've seen https://issues.apache.org/jira/browse/SOLR-5244 and
>>> https://issues.apache.org/jira/browse/SOLR-8664, which are related to
>> this
>>> issue, but don't fix it. Maybe I've missed a more relevant issue?
>>>
>>> Our use-case We are using Solrcloud in our team and it's added a huge
>>> amount of value to our users.
>>>
>>> We show a table of search results ordered by score (relevancy) that was
>>> obtained from sending a query to the standard /select handler. We're
>>> working in the life-sciences domain and it is common for our result sets
>> to
>>> contain many millions of results (unfortunately). After users browse
>> their
>>> results, they then may want to download the results that they see, to do
>>> some post-processing. However, to do this, such that the results appear
>> in
>>> the order that the user originally saw them, we'd need to be able to
>> export
>>> results based on score/relevancy.
>>>
>>> Any suggestions or advice on this would be greatly appreciated!
>>>
>>> Many thanks!
>>>
>>> Edd
>>>
>>> PS. apologies for posting also on Stackoverflow (
>>>
>> https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score
>> )
>>> --
>>> I only discovered the Solr mailing-list afterwards and thought it
>> probably
>>> better to reach out directly to Solr's people (I can share any answer
>> from
>>> this forum on there retrospectively).
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Solrcloud export all results sorted by score

Edward Turner
In reply to this post by Walter Underwood
Hi Walter,

Thank you also for your reply. Good to know of your experience. Roughly how
many documents were you fetching? Unfortunately, it's possible that some of
our users could attempt to "download" many records, meaning we'd need to
make a request to Solr where rows >= 150M. A key challenge for us is that
in the life sciences, when more sequencing data comes in, it's possible for
our data-sets to grow extremely quickly. Currently it doubles every 18
months or so (and today we have about 200M records, so not super big right
now).

Best,
Edd
--------------------
Edward Turner


On Tue, 1 Oct 2019 at 17:33, Walter Underwood <[hidden email]> wrote:

> I had to do this recently on a Solr Cloud cluster. I wanted to export all
> the IDs, but they weren’t stored as docvalues.
>
> The fastest approach was to fetch all the IDs in one request. First, I
> make a request for zero rows to get the numFound. Then I fetch
> numFound+1000 (in case docs were added while I wasn’t looking) in one
> request.
>
> I also have a hairy shell script to do /export on each leader after
> parsing cluster status. That might be a little large to post to this list,
> but I can do it if there is general interest.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 1, 2019, at 9:14 AM, Erick Erickson <[hidden email]>
> wrote:
> >
> > First, thanks for taking the time to ask a question with enough
> supporting details that I can hope to be able to answer in one exchange ;).
> It’s a pleasure to see.
> >
> > Second, NP with asking on Stack Overflow, they have some excellent
> answers there. But you’re right, this list gets more Solr-centered eyeballs.
> >
> > On to your question. I think the best answer was that “/export wasn’t
> designed to deal with scores”, which you’ll find disappointing.
> >
> > You could use the Streaming “search” expression (using qt=/select or
> just leave qt out) but that’ll sort all of the docs you’re exporting into a
> huge list, which may perform worse than CursorMark even if it doesn’t blow
> up memory.
> >
> > The root of this problem is that export can sort in batches since the
> values it’s sorting on are contained in each document, so it can iterate in
> batches, send them out, then iterate again on the remaining documents.
> >
> > Score, since it’s dynamic, can’t do that. Solr has to score _all_ the
> docs to know where a doc lands in the final set relative to any other doc,
> so if it were going to work it’d have to have enough memory to hold the
> scores of all the docs in an ordered list, which is very expensive.
> Conceptually this is an ordered list up to maxDoc long. Not only does there
> have to be enough memory to hold the entire list, every doc has to be
> inserted individually which can kill performance. This is the “deep paging”
> problem.
> >
> > In the usual case of returning, say, 20 docs, the sorted list only has
> to be 20 long, higher scoring docs evict lower scoring docs.
> >
> > So I think CursorMark is your best bet.
> >
> > Best,
> > Erick
> >
> >> On Oct 1, 2019, at 3:59 AM, Edward Turner <[hidden email]> wrote:
> >>
> >> Hi all,
> >>
> >> As far as I understand, SolrCloud currently does not allow the use of
> >> sorting by the pseudofield, score in the /export request handler (i.e.,
> get
> >> the results in relevancy order). If we do attempt this, we get an
> >> exception, "org.apache.solr.search.SyntaxError: Scoring is not currently
> >> supported with xsort". We could use Solr's cursorMark, but this takes a
> >> very long time ...
> >>
> >> Exporting results does work, however, when exporting result sets by a
> >> specific document field that has docValues set to true.
> >>
> >> Question:
> >> Does anyone know if/when it will be possible to sort by score in the
> >> /export handler?
> >>
> >> Research on the problem:
> >> We've seen https://issues.apache.org/jira/browse/SOLR-5244 and
> >> https://issues.apache.org/jira/browse/SOLR-8664, which are related to
> this
> >> issue, but don't fix it. Maybe I've missed a more relevant issue?
> >>
> >> Our use-case We are using Solrcloud in our team and it's added a huge
> >> amount of value to our users.
> >>
> >> We show a table of search results ordered by score (relevancy) that was
> >> obtained from sending a query to the standard /select handler. We're
> >> working in the life-sciences domain and it is common for our result
> sets to
> >> contain many millions of results (unfortunately). After users browse
> their
> >> results, they then may want to download the results that they see, to do
> >> some post-processing. However, to do this, such that the results appear
> in
> >> the order that the user originally saw them, we'd need to be able to
> export
> >> results based on score/relevancy.
> >>
> >> Any suggestions or advice on this would be greatly appreciated!
> >>
> >> Many thanks!
> >>
> >> Edd
> >>
> >> PS. apologies for posting also on Stackoverflow (
> >>
> https://stackoverflow.com/questions/58167152/solrcloud-export-all-results-sorted-by-score
> )
> >> --
> >> I only discovered the Solr mailing-list afterwards and thought it
> probably
> >> better to reach out directly to Solr's people (I can share any answer
> from
> >> this forum on there retrospectively).
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solrcloud export all results sorted by score

Chris Hostetter-3
In reply to this post by Edward Turner

: We show a table of search results ordered by score (relevancy) that was
: obtained from sending a query to the standard /select handler. We're
: working in the life-sciences domain and it is common for our result sets to
: contain many millions of results (unfortunately). After users browse their
: results, they then may want to download the results that they see, to do
: some post-processing. However, to do this, such that the results appear in
: the order that the user originally saw them, we'd need to be able to export
: results based on score/relevancy.

What's your UI & middle layer like for this application and
eventual "download" ?

I'm going to presume your end user facing app is reading the data from
Solr, buffering it locally while formatting it in some user selected
export format, and then giving the user a download link?

In which case using a cursor, and making iterative requests to solr from
your app should work just fine...

https://lucene.apache.org/solr/guide/8_0/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors

(The added benefit of cursors over /export is that it doesn't require doc
values on every field you return ... which seems like something that you
might care about if you have large (text) fields and an index growing as
fast as you describe yours growing)


If you don't have any sort of middle layer application, and you're just
providing a very thin (ie: javascript) based UI in front of solr,
and need a way to stream a full result set from solr that you can give
your end users raw direct access to ... then i think you're out of luck?


-Hoss
http://www.lucidworks.com/
Reply | Threaded
Open this post in threaded view
|

Re: Solrcloud export all results sorted by score

Edward Turner
Hi Chris,

Good info, thank you for that!

> What's your UI & middle layer like for this application and
> eventual "download" ?

I'm working in a team on the back-end side of things, where we providing a
REST API that can be used by clients, which include our UI, which is a
React JS based app with various fancy bio visualisations in it. Slightly
more detail, Solr is used purely for search, giving the IDs of the hits. We
then use a key-value store to fetch the IDs entity data. So, generally
speaking, each "download" involves:

1. user request asking for data in content-type X
2. our REST app makes solr request
3. IDs <- solr fetches results
4. entities <- fetch from key-value store entities with keys in IDs
5. write entities in format X

Using cursorMark, 3 & 4 will be performed repeatedly until all hits
fetched; and we may run 3 in a separate thread to 4 & 5, to ensure Solr
communication need not block fetching entity data / writing. We could do
more optimisation around these tasks, but I'm sure you've already
understood.

Many thanks for your input.

Best,
Edd

On Thu, 3 Oct 2019 at 19:13, Chris Hostetter <[hidden email]>
wrote:

>
> : We show a table of search results ordered by score (relevancy) that was
> : obtained from sending a query to the standard /select handler. We're
> : working in the life-sciences domain and it is common for our result sets
> to
> : contain many millions of results (unfortunately). After users browse
> their
> : results, they then may want to download the results that they see, to do
> : some post-processing. However, to do this, such that the results appear
> in
> : the order that the user originally saw them, we'd need to be able to
> export
> : results based on score/relevancy.
>
> What's your UI & middle layer like for this application and
> eventual "download" ?
>
> I'm going to presume your end user facing app is reading the data from
> Solr, buffering it locally while formatting it in some user selected
> export format, and then giving the user a download link?
>
> In which case using a cursor, and making iterative requests to solr from
> your app should work just fine...
>
>
> https://lucene.apache.org/solr/guide/8_0/pagination-of-results.html#fetching-a-large-number-of-sorted-results-cursors
>
> (The added benefit of cursors over /export is that it doesn't require doc
> values on every field you return ... which seems like something that you
> might care about if you have large (text) fields and an index growing as
> fast as you describe yours growing)
>
>
> If you don't have any sort of middle layer application, and you're just
> providing a very thin (ie: javascript) based UI in front of solr,
> and need a way to stream a full result set from solr that you can give
> your end users raw direct access to ... then i think you're out of luck?
>
>
> -Hoss
> http://www.lucidworks.com/
>