[Data Import Handler] proposal: make FileListEntityProcessor streaming

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[Data Import Handler] proposal: make FileListEntityProcessor streaming

Marco Bolis
Hello,

I just wrote a patch to make FileListEntityProcessor work by streaming, using Java 8 Stream and NIO2, instead of buffering the entire file list in memory.
I had to do it because I had a very large list of files (upwards of 1M) and kept going OOM.

I wish I could contribute this patch, if it is deemed useful.

Regards,
Marco

Reply | Threaded
Open this post in threaded view
|

Re: [Data Import Handler] proposal: make FileListEntityProcessor streaming

Ishan Chattopadhyaya
Please feel free to open a jira and submit a pr or patch. FYI, DIH will undergo deprecation from 8.6 onwards, but it should still stay available for use via community supported packages (that should be ready very shortly). So, your contribution is very much welcome!

Thanks,
Ishan

On Thu, 9 Jul, 2020, 7:27 pm Marco Bolis, <[hidden email]> wrote:
Hello,

I just wrote a patch to make FileListEntityProcessor work by streaming, using Java 8 Stream and NIO2, instead of buffering the entire file list in memory.
I had to do it because I had a very large list of files (upwards of 1M) and kept going OOM.

I wish I could contribute this patch, if it is deemed useful.

Regards,
Marco

Reply | Threaded
Open this post in threaded view
|

Re: [Data Import Handler] proposal: make FileListEntityProcessor streaming

Erick Erickson
In reply to this post by Marco Bolis
Marco:

Thanks for volunteering your fix!

The best way is to raise a JIRA, see: https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker) and attach a patch or pull request. From there we can discuss/give feedback/add to the repo, etc.

Best,
Erick

> On Jul 9, 2020, at 9:56 AM, Marco Bolis <[hidden email]> wrote:
>
> Hello,
>
> I just wrote a patch to make FileListEntityProcessor work by streaming, using Java 8 Stream and NIO2, instead of buffering the entire file list in memory.
> I had to do it because I had a very large list of files (upwards of 1M) and kept going OOM.
>
> I wish I could contribute this patch, if it is deemed useful.
>
> Regards,
> Marco
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Data Import Handler] proposal: make FileListEntityProcessor streaming

Marco Bolis
Thanks for the answers.

Excuse me, I'm new to this: how am I supposed to attach the patch / PR to the issue?
Is it ok to add a diff as attachment?
Should I open the PR and link to it from the issue?

Thank you very much, regards,
Marco

Il giorno gio 9 lug 2020 alle ore 17:06 Erick Erickson <[hidden email]> ha scritto:
Marco:

Thanks for volunteering your fix!

The best way is to raise a JIRA, see: https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker) and attach a patch or pull request. From there we can discuss/give feedback/add to the repo, etc.

Best,
Erick

> On Jul 9, 2020, at 9:56 AM, Marco Bolis <[hidden email]> wrote:
>
> Hello,
>
> I just wrote a patch to make FileListEntityProcessor work by streaming, using Java 8 Stream and NIO2, instead of buffering the entire file list in memory.
> I had to do it because I had a very large list of files (upwards of 1M) and kept going OOM.
>
> I wish I could contribute this patch, if it is deemed useful.
>
> Regards,
> Marco
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Data Import Handler] proposal: make FileListEntityProcessor streaming

Erick Erickson
If you’ve created a JIRA login, there should be a button on the JIRA about “attach files”. It’s perfectly OK to attach a diff file to the JIRA. It’s preferred to just label it SOLR-#####.patch. Successive versions of the patch should have the exact same name, the old ones are grayed out making it easy to know what the most recent one is without losing the old versions. No big deal though.

If you’re familiar with GIT and have your own fork somewhere, it’s just the usual process of creating a Pull Request from your GitHub repo. If you mention the JIRA when you create the PR by starting the title with “SOLR-#####: any comments you want to make”, it’ll automagically be linked to the JIRA you created. I’ve personally found this a bit confusing because the title you edit is not the first screen when you hit the “create PR” button. If the automagic linking doesn’t work, just paste a link to the PR in the comments.

Don’t stress over it, if making a PR is bothersome, just attach a diff file. Either one is fine. Code reviews are easier with a PR, but depending on the size of the patch the utility of easy reviews may be only marginally beneficial.

Best,
Erick

> On Jul 9, 2020, at 11:23 AM, Marco Bolis <[hidden email]> wrote:
>
> Thanks for the answers.
>
> Excuse me, I'm new to this: how am I supposed to attach the patch / PR to the issue?
> Is it ok to add a diff as attachment?
> Should I open the PR and link to it from the issue?
>
> Thank you very much, regards,
> Marco
>
> Il giorno gio 9 lug 2020 alle ore 17:06 Erick Erickson <[hidden email]> ha scritto:
> Marco:
>
> Thanks for volunteering your fix!
>
> The best way is to raise a JIRA, see: https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker) and attach a patch or pull request. From there we can discuss/give feedback/add to the repo, etc.
>
> Best,
> Erick
>
> > On Jul 9, 2020, at 9:56 AM, Marco Bolis <[hidden email]> wrote:
> >
> > Hello,
> >
> > I just wrote a patch to make FileListEntityProcessor work by streaming, using Java 8 Stream and NIO2, instead of buffering the entire file list in memory.
> > I had to do it because I had a very large list of files (upwards of 1M) and kept going OOM.
> >
> > I wish I could contribute this patch, if it is deemed useful.
> >
> > Regards,
> > Marco
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Data Import Handler] proposal: make FileListEntityProcessor streaming

Eric Pugh-4
Another thought….  

Since DIH is moving to a community supported (https://github.com/rohitbemax/dataimporthandler) plugin for Solr, maybe you want to focus your efforts on that project?  

One of the reasons for moving DIH into it’s own plugin it to open the door to more contributions from the community, and this is a good example! 



On Jul 9, 2020, at 12:09 PM, Erick Erickson <[hidden email]> wrote:

If you’ve created a JIRA login, there should be a button on the JIRA about “attach files”. It’s perfectly OK to attach a diff file to the JIRA. It’s preferred to just label it SOLR-#####.patch. Successive versions of the patch should have the exact same name, the old ones are grayed out making it easy to know what the most recent one is without losing the old versions. No big deal though.

If you’re familiar with GIT and have your own fork somewhere, it’s just the usual process of creating a Pull Request from your GitHub repo. If you mention the JIRA when you create the PR by starting the title with “SOLR-#####: any comments you want to make”, it’ll automagically be linked to the JIRA you created. I’ve personally found this a bit confusing because the title you edit is not the first screen when you hit the “create PR” button. If the automagic linking doesn’t work, just paste a link to the PR in the comments.

Don’t stress over it, if making a PR is bothersome, just attach a diff file. Either one is fine. Code reviews are easier with a PR, but depending on the size of the patch the utility of easy reviews may be only marginally beneficial.

Best,
Erick

On Jul 9, 2020, at 11:23 AM, Marco Bolis <[hidden email]> wrote:

Thanks for the answers.

Excuse me, I'm new to this: how am I supposed to attach the patch / PR to the issue?
Is it ok to add a diff as attachment?
Should I open the PR and link to it from the issue?

Thank you very much, regards,
Marco

Il giorno gio 9 lug 2020 alle ore 17:06 Erick Erickson <[hidden email]> ha scritto:
Marco:

Thanks for volunteering your fix!

The best way is to raise a JIRA, see: https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker) and attach a patch or pull request. From there we can discuss/give feedback/add to the repo, etc.

Best,
Erick

On Jul 9, 2020, at 9:56 AM, Marco Bolis <[hidden email]> wrote:

Hello,

I just wrote a patch to make FileListEntityProcessor work by streaming, using Java 8 Stream and NIO2, instead of buffering the entire file list in memory.
I had to do it because I had a very large list of files (upwards of 1M) and kept going OOM.

I wish I could contribute this patch, if it is deemed useful.

Regards,
Marco



---------------------------------------------------------------------
To unsubscribe, [hidden email]
For additional commands, [hidden email]



---------------------------------------------------------------------
To unsubscribe, [hidden email]
For additional commands, [hidden email]


_______________________
Eric Pugh Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy  
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

Reply | Threaded
Open this post in threaded view
|

Re: [Data Import Handler] proposal: make FileListEntityProcessor streaming

Marco Bolis
I see.
How is the transition going to work Eric?
I understand the community supported project is going to take over from Solr 9.0, is that correct? Is DIH code on the Lucene side going to freeze soon?
Thank you for the heads up.

Regards,
Marco


Il giorno gio 9 lug 2020 alle ore 18:49 Eric Pugh <[hidden email]> ha scritto:
Another thought….  

Since DIH is moving to a community supported (https://github.com/rohitbemax/dataimporthandler) plugin for Solr, maybe you want to focus your efforts on that project?  

One of the reasons for moving DIH into it’s own plugin it to open the door to more contributions from the community, and this is a good example! 



On Jul 9, 2020, at 12:09 PM, Erick Erickson <[hidden email]> wrote:

If you’ve created a JIRA login, there should be a button on the JIRA about “attach files”. It’s perfectly OK to attach a diff file to the JIRA. It’s preferred to just label it SOLR-#####.patch. Successive versions of the patch should have the exact same name, the old ones are grayed out making it easy to know what the most recent one is without losing the old versions. No big deal though.

If you’re familiar with GIT and have your own fork somewhere, it’s just the usual process of creating a Pull Request from your GitHub repo. If you mention the JIRA when you create the PR by starting the title with “SOLR-#####: any comments you want to make”, it’ll automagically be linked to the JIRA you created. I’ve personally found this a bit confusing because the title you edit is not the first screen when you hit the “create PR” button. If the automagic linking doesn’t work, just paste a link to the PR in the comments.

Don’t stress over it, if making a PR is bothersome, just attach a diff file. Either one is fine. Code reviews are easier with a PR, but depending on the size of the patch the utility of easy reviews may be only marginally beneficial.

Best,
Erick

On Jul 9, 2020, at 11:23 AM, Marco Bolis <[hidden email]> wrote:

Thanks for the answers.

Excuse me, I'm new to this: how am I supposed to attach the patch / PR to the issue?
Is it ok to add a diff as attachment?
Should I open the PR and link to it from the issue?

Thank you very much, regards,
Marco

Il giorno gio 9 lug 2020 alle ore 17:06 Erick Erickson <[hidden email]> ha scritto:
Marco:

Thanks for volunteering your fix!

The best way is to raise a JIRA, see: https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker) and attach a patch or pull request. From there we can discuss/give feedback/add to the repo, etc.

Best,
Erick

On Jul 9, 2020, at 9:56 AM, Marco Bolis <[hidden email]> wrote:

Hello,

I just wrote a patch to make FileListEntityProcessor work by streaming, using Java 8 Stream and NIO2, instead of buffering the entire file list in memory.
I had to do it because I had a very large list of files (upwards of 1M) and kept going OOM.

I wish I could contribute this patch, if it is deemed useful.

Regards,
Marco



---------------------------------------------------------------------
To unsubscribe, [hidden email]
For additional commands, [hidden email]



---------------------------------------------------------------------
To unsubscribe, [hidden email]
For additional commands, [hidden email]


_______________________
Eric Pugh Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy  
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

Reply | Threaded
Open this post in threaded view
|

Re: [Data Import Handler] proposal: make FileListEntityProcessor streaming

Eric Pugh-4
I think the whole transition is still work in progress, however I believe that either in Solr 9, or 9.x very soon after Solr 9 that DIH will be removed.     

For you, I would suggest you want to “Skate to where the puck will be” [1], which is that the future of DIH is going to be as a community supported plugin for Solr.   That community will govern itself how it wants, evaluate the future direction of DIH the way it wants, and generally evolve in it’s own direction.   

I suspect that the appetite for improving DIH with new features etc, which was already an issue, is going to be waning rapidly going forward.

Today, if my query is right [2], there are 146 open JIRA related to DIH.   That’s a huge number, and it speaks to the fact that the current comittership isn’t focused on DIH.    I suspect your patch will linger in there.

So, my suggestion is to focus on getting your enhancement into the community plugin, and otherwise contribute to a thriving component on that project.





[2] https://issues.apache.org/jira/browse/SOLR-14490?jql=project%20%3D%20SOLR%20AND%20status%20in%20(Open%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20component%20%3D%20%22contrib%20-%20DataImportHandler%22

On Jul 9, 2020, at 12:58 PM, Marco Bolis <[hidden email]> wrote:

I see.
How is the transition going to work Eric?
I understand the community supported project is going to take over from Solr 9.0, is that correct? Is DIH code on the Lucene side going to freeze soon?
Thank you for the heads up.

Regards,
Marco


Il giorno gio 9 lug 2020 alle ore 18:49 Eric Pugh <[hidden email]> ha scritto:
Another thought….  

Since DIH is moving to a community supported (https://github.com/rohitbemax/dataimporthandler) plugin for Solr, maybe you want to focus your efforts on that project?  

One of the reasons for moving DIH into it’s own plugin it to open the door to more contributions from the community, and this is a good example! 



On Jul 9, 2020, at 12:09 PM, Erick Erickson <[hidden email]> wrote:

If you’ve created a JIRA login, there should be a button on the JIRA about “attach files”. It’s perfectly OK to attach a diff file to the JIRA. It’s preferred to just label it SOLR-#####.patch. Successive versions of the patch should have the exact same name, the old ones are grayed out making it easy to know what the most recent one is without losing the old versions. No big deal though.

If you’re familiar with GIT and have your own fork somewhere, it’s just the usual process of creating a Pull Request from your GitHub repo. If you mention the JIRA when you create the PR by starting the title with “SOLR-#####: any comments you want to make”, it’ll automagically be linked to the JIRA you created. I’ve personally found this a bit confusing because the title you edit is not the first screen when you hit the “create PR” button. If the automagic linking doesn’t work, just paste a link to the PR in the comments.

Don’t stress over it, if making a PR is bothersome, just attach a diff file. Either one is fine. Code reviews are easier with a PR, but depending on the size of the patch the utility of easy reviews may be only marginally beneficial.

Best,
Erick

On Jul 9, 2020, at 11:23 AM, Marco Bolis <[hidden email]> wrote:

Thanks for the answers.

Excuse me, I'm new to this: how am I supposed to attach the patch / PR to the issue?
Is it ok to add a diff as attachment?
Should I open the PR and link to it from the issue?

Thank you very much, regards,
Marco

Il giorno gio 9 lug 2020 alle ore 17:06 Erick Erickson <[hidden email]> ha scritto:
Marco:

Thanks for volunteering your fix!

The best way is to raise a JIRA, see: https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker) and attach a patch or pull request. From there we can discuss/give feedback/add to the repo, etc.

Best,
Erick

On Jul 9, 2020, at 9:56 AM, Marco Bolis <[hidden email]> wrote:

Hello,

I just wrote a patch to make FileListEntityProcessor work by streaming, using Java 8 Stream and NIO2, instead of buffering the entire file list in memory.
I had to do it because I had a very large list of files (upwards of 1M) and kept going OOM.

I wish I could contribute this patch, if it is deemed useful.

Regards,
Marco



---------------------------------------------------------------------
To unsubscribe, [hidden email]
For additional commands, [hidden email]



---------------------------------------------------------------------
To unsubscribe, [hidden email]
For additional commands, [hidden email]


_______________________
Eric Pugh Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy  
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.


_______________________
Eric Pugh Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy  
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

Reply | Threaded
Open this post in threaded view
|

Re: [Data Import Handler] proposal: make FileListEntityProcessor streaming

Noble Paul നോബിള്‍  नोब्ळ्
In reply to this post by Marco Bolis
The project will go live anytime from now. It means a user can use it
on any release newer than Solr 8.6 . Even if you provide a fix in the
current 8.x branch, it will not be available before Solr 8.7 release.
OTOH, DIH plugin will have bug fix releases independent of Solr
releases and every user will be able to upgrade their plugin without
upgrading their Solr.

So, please give PRs to both the external plugin and to Solr

On Fri, Jul 10, 2020 at 2:58 AM Marco Bolis <[hidden email]> wrote:

>
> I see.
> How is the transition going to work Eric?
> I understand the community supported project is going to take over from Solr 9.0, is that correct? Is DIH code on the Lucene side going to freeze soon?
> Thank you for the heads up.
>
> Regards,
> Marco
>
>
> Il giorno gio 9 lug 2020 alle ore 18:49 Eric Pugh <[hidden email]> ha scritto:
>>
>> Another thought….
>>
>> Since DIH is moving to a community supported (https://github.com/rohitbemax/dataimporthandler) plugin for Solr, maybe you want to focus your efforts on that project?
>>
>> One of the reasons for moving DIH into it’s own plugin it to open the door to more contributions from the community, and this is a good example!
>>
>>
>>
>> On Jul 9, 2020, at 12:09 PM, Erick Erickson <[hidden email]> wrote:
>>
>> If you’ve created a JIRA login, there should be a button on the JIRA about “attach files”. It’s perfectly OK to attach a diff file to the JIRA. It’s preferred to just label it SOLR-#####.patch. Successive versions of the patch should have the exact same name, the old ones are grayed out making it easy to know what the most recent one is without losing the old versions. No big deal though.
>>
>> If you’re familiar with GIT and have your own fork somewhere, it’s just the usual process of creating a Pull Request from your GitHub repo. If you mention the JIRA when you create the PR by starting the title with “SOLR-#####: any comments you want to make”, it’ll automagically be linked to the JIRA you created. I’ve personally found this a bit confusing because the title you edit is not the first screen when you hit the “create PR” button. If the automagic linking doesn’t work, just paste a link to the PR in the comments.
>>
>> Don’t stress over it, if making a PR is bothersome, just attach a diff file. Either one is fine. Code reviews are easier with a PR, but depending on the size of the patch the utility of easy reviews may be only marginally beneficial.
>>
>> Best,
>> Erick
>>
>> On Jul 9, 2020, at 11:23 AM, Marco Bolis <[hidden email]> wrote:
>>
>> Thanks for the answers.
>>
>> Excuse me, I'm new to this: how am I supposed to attach the patch / PR to the issue?
>> Is it ok to add a diff as attachment?
>> Should I open the PR and link to it from the issue?
>>
>> Thank you very much, regards,
>> Marco
>>
>> Il giorno gio 9 lug 2020 alle ore 17:06 Erick Erickson <[hidden email]> ha scritto:
>> Marco:
>>
>> Thanks for volunteering your fix!
>>
>> The best way is to raise a JIRA, see: https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker) and attach a patch or pull request. From there we can discuss/give feedback/add to the repo, etc.
>>
>> Best,
>> Erick
>>
>> On Jul 9, 2020, at 9:56 AM, Marco Bolis <[hidden email]> wrote:
>>
>> Hello,
>>
>> I just wrote a patch to make FileListEntityProcessor work by streaming, using Java 8 Stream and NIO2, instead of buffering the entire file list in memory.
>> I had to do it because I had a very large list of files (upwards of 1M) and kept going OOM.
>>
>> I wish I could contribute this patch, if it is deemed useful.
>>
>> Regards,
>> Marco
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>> _______________________
>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
>> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
>>


--
-----------------------------------------------------
Noble Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Data Import Handler] proposal: make FileListEntityProcessor streaming

Marco Bolis
Thank you very much.
I opened PRs to both projects.
Regards,
Marco

Il giorno ven 10 lug 2020 alle ore 05:04 Noble Paul <[hidden email]> ha scritto:
The project will go live anytime from now. It means a user can use it
on any release newer than Solr 8.6 . Even if you provide a fix in the
current 8.x branch, it will not be available before Solr 8.7 release.
OTOH, DIH plugin will have bug fix releases independent of Solr
releases and every user will be able to upgrade their plugin without
upgrading their Solr.

So, please give PRs to both the external plugin and to Solr

On Fri, Jul 10, 2020 at 2:58 AM Marco Bolis <[hidden email]> wrote:
>
> I see.
> How is the transition going to work Eric?
> I understand the community supported project is going to take over from Solr 9.0, is that correct? Is DIH code on the Lucene side going to freeze soon?
> Thank you for the heads up.
>
> Regards,
> Marco
>
>
> Il giorno gio 9 lug 2020 alle ore 18:49 Eric Pugh <[hidden email]> ha scritto:
>>
>> Another thought….
>>
>> Since DIH is moving to a community supported (https://github.com/rohitbemax/dataimporthandler) plugin for Solr, maybe you want to focus your efforts on that project?
>>
>> One of the reasons for moving DIH into it’s own plugin it to open the door to more contributions from the community, and this is a good example!
>>
>>
>>
>> On Jul 9, 2020, at 12:09 PM, Erick Erickson <[hidden email]> wrote:
>>
>> If you’ve created a JIRA login, there should be a button on the JIRA about “attach files”. It’s perfectly OK to attach a diff file to the JIRA. It’s preferred to just label it SOLR-#####.patch. Successive versions of the patch should have the exact same name, the old ones are grayed out making it easy to know what the most recent one is without losing the old versions. No big deal though.
>>
>> If you’re familiar with GIT and have your own fork somewhere, it’s just the usual process of creating a Pull Request from your GitHub repo. If you mention the JIRA when you create the PR by starting the title with “SOLR-#####: any comments you want to make”, it’ll automagically be linked to the JIRA you created. I’ve personally found this a bit confusing because the title you edit is not the first screen when you hit the “create PR” button. If the automagic linking doesn’t work, just paste a link to the PR in the comments.
>>
>> Don’t stress over it, if making a PR is bothersome, just attach a diff file. Either one is fine. Code reviews are easier with a PR, but depending on the size of the patch the utility of easy reviews may be only marginally beneficial.
>>
>> Best,
>> Erick
>>
>> On Jul 9, 2020, at 11:23 AM, Marco Bolis <[hidden email]> wrote:
>>
>> Thanks for the answers.
>>
>> Excuse me, I'm new to this: how am I supposed to attach the patch / PR to the issue?
>> Is it ok to add a diff as attachment?
>> Should I open the PR and link to it from the issue?
>>
>> Thank you very much, regards,
>> Marco
>>
>> Il giorno gio 9 lug 2020 alle ore 17:06 Erick Erickson <[hidden email]> ha scritto:
>> Marco:
>>
>> Thanks for volunteering your fix!
>>
>> The best way is to raise a JIRA, see: https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContribute-JIRAtips(ourissue/bugtracker) and attach a patch or pull request. From there we can discuss/give feedback/add to the repo, etc.
>>
>> Best,
>> Erick
>>
>> On Jul 9, 2020, at 9:56 AM, Marco Bolis <[hidden email]> wrote:
>>
>> Hello,
>>
>> I just wrote a patch to make FileListEntityProcessor work by streaming, using Java 8 Stream and NIO2, instead of buffering the entire file list in memory.
>> I had to do it because I had a very large list of files (upwards of 1M) and kept going OOM.
>>
>> I wish I could contribute this patch, if it is deemed useful.
>>
>> Regards,
>> Marco
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>> _______________________
>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
>> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
>>


--
-----------------------------------------------------
Noble Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]