Build suggester in different directory (not /tmp).

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Build suggester in different directory (not /tmp).

Matthew Roth-2
Hi List,

I am building a few suggester's and I am receiving the error that I have no
space left on device.


<lst name="error">
<str name="msg">No space left on device</str>
<str name="trace">
java.io.IOException: No space left on device at
sun.nio.ch.FileDispatcherImpl.write0(Native Method) at
...



At first this threw me. df showed I had over 100 G free. the /data dir the
suggester is being constructed from is only 4G. On a subsequent run I
notice that the suggester is first being built in /tmp. When setting up the
LVM I only allotted 2g's to that directory and I prefer to keep it that
way. Is there a way to build the suggester's in an alternative dir? I am
not seeing anything in the documentation (
https://lucene.apache.org/solr/guide/6_6/suggester.html)

I should note that I am using solr 6.6.0

Best,
Matt
Reply | Threaded
Open this post in threaded view
|

Re: Build suggester in different directory (not /tmp).

Matthew Roth-2
I have an incomplete solution. I was trying to build three suggester's at
once. If I added the ?suggest.dictionary=<dict> parameter and built one at
a time it worked out fine. However, this means I will need to set
buildOnCommit and buildOnStartup to false. This is less than ideal.
Building in a different directory would still be preferable.


Best,
Matt

On Wed, Dec 20, 2017 at 12:05 PM, Matthew Roth <[hidden email]> wrote:

> Hi List,
>
> I am building a few suggester's and I am receiving the error that I have
> no space left on device.
>
>
> <lst name="error">
> <str name="msg">No space left on device</str>
> <str name="trace">
> java.io.IOException: No space left on device at
> sun.nio.ch.FileDispatcherImpl.write0(Native Method) at
> ...
>
>
>
> At first this threw me. df showed I had over 100 G free. the /data dir
> the suggester is being constructed from is only 4G. On a subsequent run I
> notice that the suggester is first being built in /tmp. When setting up
> the LVM I only allotted 2g's to that directory and I prefer to keep it that
> way. Is there a way to build the suggester's in an alternative dir? I am
> not seeing anything in the documentation (https://lucene.apache.org/
> solr/guide/6_6/suggester.html)
>
> I should note that I am using solr 6.6.0
>
> Best,
> Matt
>
Reply | Threaded
Open this post in threaded view
|

Re: Build suggester in different directory (not /tmp).

Erick Erickson
bq: this means I will need to set buildOnCommit and buildOnStartup to false.

Be _very_ careful with these settings. Building your suggester can read the
stored field(s) from _every_ document in your index to build which can
take a very long time (perhaps hours). You'd pay that penalty every time
you started Solr or committed docs. I almost guarantee that buildOnCommit
will be unsatisfactory.

This is one of those things that works fine for testing a small corpus but
can fall over when you scale up.

As for why the suggester gets built in /tmp, perhaps Mike McCandless has
magic to control that, nice find and thanks for sharing it!

Best,
Erick

On Wed, Dec 20, 2017 at 9:27 AM, Matthew Roth <[hidden email]> wrote:

> I have an incomplete solution. I was trying to build three suggester's at
> once. If I added the ?suggest.dictionary=<dict> parameter and built one at
> a time it worked out fine. However, this means I will need to set
> buildOnCommit and buildOnStartup to false. This is less than ideal.
> Building in a different directory would still be preferable.
>
>
> Best,
> Matt
>
> On Wed, Dec 20, 2017 at 12:05 PM, Matthew Roth <[hidden email]> wrote:
>
>> Hi List,
>>
>> I am building a few suggester's and I am receiving the error that I have
>> no space left on device.
>>
>>
>> <lst name="error">
>> <str name="msg">No space left on device</str>
>> <str name="trace">
>> java.io.IOException: No space left on device at
>> sun.nio.ch.FileDispatcherImpl.write0(Native Method) at
>> ...
>>
>>
>>
>> At first this threw me. df showed I had over 100 G free. the /data dir
>> the suggester is being constructed from is only 4G. On a subsequent run I
>> notice that the suggester is first being built in /tmp. When setting up
>> the LVM I only allotted 2g's to that directory and I prefer to keep it that
>> way. Is there a way to build the suggester's in an alternative dir? I am
>> not seeing anything in the documentation (https://lucene.apache.org/
>> solr/guide/6_6/suggester.html)
>>
>> I should note that I am using solr 6.6.0
>>
>> Best,
>> Matt
>>
Reply | Threaded
Open this post in threaded view
|

Re: Build suggester in different directory (not /tmp).

Shawn Heisey-2
In reply to this post by Matthew Roth-2
On 12/20/2017 10:05 AM, Matthew Roth wrote:
> I am building a few suggester's and I am receiving the error that I have no
> space left on device.

<snip>

> At first this threw me. df showed I had over 100 G free. the /data dir the
> suggester is being constructed from is only 4G. On a subsequent run I
> notice that the suggester is first being built in /tmp. When setting up the
> LVM I only allotted 2g's to that directory and I prefer to keep it that
> way.

The code is utilizing the "java.io.tmpdir" system property to determine
a temporary directory location to use for the build, before it is put in
the final location.  On POSIX platforms, this will default to /tmp.

If you are starting Solr manually, then you would just need to add the
following parameter to the bin/solr commandline (including the quotes)
to change this location:

-a "-Djava.io.tmpdir=/other/tmp/path"

If you've installed Solr as a service, then I do not think there's any
easy way to adjust this property, other than manually editing bin/solr
to add the -D option to the startup commandline.  We'll need an
enhancement issue in Jira to modify the script so it can set
java.io.tmpdir from an environment variable.

Note that adjusting this property may result in other things that Solr
creates being moved away from /tmp.

Since most POSIX operating systems will automatically delete old files
in /tmp, it's always possible that when you move Java's temp directory,
you'll end up with cruft in the new location that never gets deleted. 
Developers do generally try to clean up temporary files, but sometimes
things go wrong that weren't anticipated.  If that does happen and a
temporary file is created by Lucene/Solr that doesn't get deleted, then
I would consider that a bug that should be fixed.

On Windows systems, Java asks the OS where the temp directory is.  The
info I've found says that the TMP environment variable will override
this location for Windows, but not for other platforms.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Build suggester in different directory (not /tmp).

Matthew Roth-2
Thanks Erick,

I'll head your warning. Ultimately, the index will be rather static so I do
not fear much from buildingOnComit. But I think building on startup would
likely be set to false regardless.

Shawn,

Thank you as well. That is very informative regarding java.io.tmpdir. I am
starting this as a service, but I think I can handle making the required
changes.

Best,
Matt

On Wed, Dec 20, 2017 at 2:58 PM, Shawn Heisey <[hidden email]> wrote:

> On 12/20/2017 10:05 AM, Matthew Roth wrote:
> > I am building a few suggester's and I am receiving the error that I have
> no
> > space left on device.
>
> <snip>
>
> > At first this threw me. df showed I had over 100 G free. the /data dir
> the
> > suggester is being constructed from is only 4G. On a subsequent run I
> > notice that the suggester is first being built in /tmp. When setting up
> the
> > LVM I only allotted 2g's to that directory and I prefer to keep it that
> > way.
>
> The code is utilizing the "java.io.tmpdir" system property to determine
> a temporary directory location to use for the build, before it is put in
> the final location.  On POSIX platforms, this will default to /tmp.
>
> If you are starting Solr manually, then you would just need to add the
> following parameter to the bin/solr commandline (including the quotes)
> to change this location:
>
> -a "-Djava.io.tmpdir=/other/tmp/path"
>
> If you've installed Solr as a service, then I do not think there's any
> easy way to adjust this property, other than manually editing bin/solr
> to add the -D option to the startup commandline.  We'll need an
> enhancement issue in Jira to modify the script so it can set
> java.io.tmpdir from an environment variable.
>
> Note that adjusting this property may result in other things that Solr
> creates being moved away from /tmp.
>
> Since most POSIX operating systems will automatically delete old files
> in /tmp, it's always possible that when you move Java's temp directory,
> you'll end up with cruft in the new location that never gets deleted.
> Developers do generally try to clean up temporary files, but sometimes
> things go wrong that weren't anticipated.  If that does happen and a
> temporary file is created by Lucene/Solr that doesn't get deleted, then
> I would consider that a bug that should be fixed.
>
> On Windows systems, Java asks the OS where the temp directory is.  The
> info I've found says that the TMP environment variable will override
> this location for Windows, but not for other platforms.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Build suggester in different directory (not /tmp).

Erick Erickson
Matthew:

I think you'll be awfully unhappy with buildOnCommit. Say you're
bulk-indexing and committing every 15 seconds....

buildOnStartup is problematical as well since it'd rebuild everytime
you bounced Solr even if the index hadn't changed.

Personally I'd alter my indexing process to fire a build command when
it was done.

Or, if you can afford to optimize after _every_ set of updates (say
you only update every day or less often) then buildOnOptimize makes
sense.

Best,
Erick

On Wed, Dec 20, 2017 at 12:40 PM, Matthew Roth <[hidden email]> wrote:

> Thanks Erick,
>
> I'll head your warning. Ultimately, the index will be rather static so I do
> not fear much from buildingOnComit. But I think building on startup would
> likely be set to false regardless.
>
> Shawn,
>
> Thank you as well. That is very informative regarding java.io.tmpdir. I am
> starting this as a service, but I think I can handle making the required
> changes.
>
> Best,
> Matt
>
> On Wed, Dec 20, 2017 at 2:58 PM, Shawn Heisey <[hidden email]> wrote:
>
>> On 12/20/2017 10:05 AM, Matthew Roth wrote:
>> > I am building a few suggester's and I am receiving the error that I have
>> no
>> > space left on device.
>>
>> <snip>
>>
>> > At first this threw me. df showed I had over 100 G free. the /data dir
>> the
>> > suggester is being constructed from is only 4G. On a subsequent run I
>> > notice that the suggester is first being built in /tmp. When setting up
>> the
>> > LVM I only allotted 2g's to that directory and I prefer to keep it that
>> > way.
>>
>> The code is utilizing the "java.io.tmpdir" system property to determine
>> a temporary directory location to use for the build, before it is put in
>> the final location.  On POSIX platforms, this will default to /tmp.
>>
>> If you are starting Solr manually, then you would just need to add the
>> following parameter to the bin/solr commandline (including the quotes)
>> to change this location:
>>
>> -a "-Djava.io.tmpdir=/other/tmp/path"
>>
>> If you've installed Solr as a service, then I do not think there's any
>> easy way to adjust this property, other than manually editing bin/solr
>> to add the -D option to the startup commandline.  We'll need an
>> enhancement issue in Jira to modify the script so it can set
>> java.io.tmpdir from an environment variable.
>>
>> Note that adjusting this property may result in other things that Solr
>> creates being moved away from /tmp.
>>
>> Since most POSIX operating systems will automatically delete old files
>> in /tmp, it's always possible that when you move Java's temp directory,
>> you'll end up with cruft in the new location that never gets deleted.
>> Developers do generally try to clean up temporary files, but sometimes
>> things go wrong that weren't anticipated.  If that does happen and a
>> temporary file is created by Lucene/Solr that doesn't get deleted, then
>> I would consider that a bug that should be fixed.
>>
>> On Windows systems, Java asks the OS where the temp directory is.  The
>> info I've found says that the TMP environment variable will override
>> this location for Windows, but not for other platforms.
>>
>> Thanks,
>> Shawn
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Build suggester in different directory (not /tmp).

Matthew Roth-2
Erick,

oh, yes, I think I was misunderstanding buildOnCommit. I presumed it would
run following the completion of my DIH. The behavior you described would be
very problematic!

Thank you for taking the time to point that out!

Best,
Matt

On Wed, Dec 20, 2017 at 3:47 PM, Erick Erickson <[hidden email]>
wrote:

> Matthew:
>
> I think you'll be awfully unhappy with buildOnCommit. Say you're
> bulk-indexing and committing every 15 seconds....
>
> buildOnStartup is problematical as well since it'd rebuild everytime
> you bounced Solr even if the index hadn't changed.
>
> Personally I'd alter my indexing process to fire a build command when
> it was done.
>
> Or, if you can afford to optimize after _every_ set of updates (say
> you only update every day or less often) then buildOnOptimize makes
> sense.
>
> Best,
> Erick
>
> On Wed, Dec 20, 2017 at 12:40 PM, Matthew Roth <[hidden email]> wrote:
> > Thanks Erick,
> >
> > I'll head your warning. Ultimately, the index will be rather static so I
> do
> > not fear much from buildingOnComit. But I think building on startup would
> > likely be set to false regardless.
> >
> > Shawn,
> >
> > Thank you as well. That is very informative regarding java.io.tmpdir. I
> am
> > starting this as a service, but I think I can handle making the required
> > changes.
> >
> > Best,
> > Matt
> >
> > On Wed, Dec 20, 2017 at 2:58 PM, Shawn Heisey <[hidden email]>
> wrote:
> >
> >> On 12/20/2017 10:05 AM, Matthew Roth wrote:
> >> > I am building a few suggester's and I am receiving the error that I
> have
> >> no
> >> > space left on device.
> >>
> >> <snip>
> >>
> >> > At first this threw me. df showed I had over 100 G free. the /data dir
> >> the
> >> > suggester is being constructed from is only 4G. On a subsequent run I
> >> > notice that the suggester is first being built in /tmp. When setting
> up
> >> the
> >> > LVM I only allotted 2g's to that directory and I prefer to keep it
> that
> >> > way.
> >>
> >> The code is utilizing the "java.io.tmpdir" system property to determine
> >> a temporary directory location to use for the build, before it is put in
> >> the final location.  On POSIX platforms, this will default to /tmp.
> >>
> >> If you are starting Solr manually, then you would just need to add the
> >> following parameter to the bin/solr commandline (including the quotes)
> >> to change this location:
> >>
> >> -a "-Djava.io.tmpdir=/other/tmp/path"
> >>
> >> If you've installed Solr as a service, then I do not think there's any
> >> easy way to adjust this property, other than manually editing bin/solr
> >> to add the -D option to the startup commandline.  We'll need an
> >> enhancement issue in Jira to modify the script so it can set
> >> java.io.tmpdir from an environment variable.
> >>
> >> Note that adjusting this property may result in other things that Solr
> >> creates being moved away from /tmp.
> >>
> >> Since most POSIX operating systems will automatically delete old files
> >> in /tmp, it's always possible that when you move Java's temp directory,
> >> you'll end up with cruft in the new location that never gets deleted.
> >> Developers do generally try to clean up temporary files, but sometimes
> >> things go wrong that weren't anticipated.  If that does happen and a
> >> temporary file is created by Lucene/Solr that doesn't get deleted, then
> >> I would consider that a bug that should be fixed.
> >>
> >> On Windows systems, Java asks the OS where the temp directory is.  The
> >> info I've found says that the TMP environment variable will override
> >> this location for Windows, but not for other platforms.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

AW: Build suggester in different directory (not /tmp).

Clemens Wyss DEV
In reply to this post by Erick Erickson
> I almost guarantee that buildOnCommit will be unsatisfactory
if not "on commit" when should suggestions/spellcheckings be updated? And how?

Spellchecking/suggestions@solr:  
what are the best (up-to-date) sources/links for spellchecking and suggestions?

-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:[hidden email]]
Gesendet: Mittwoch, 20. Dezember 2017 19:09
An: solr-user <[hidden email]>
Betreff: Re: Build suggester in different directory (not /tmp).

bq: this means I will need to set buildOnCommit and buildOnStartup to false.

Be _very_ careful with these settings. Building your suggester can read the stored field(s) from _every_ document in your index to build which can take a very long time (perhaps hours). You'd pay that penalty every time you started Solr or committed docs. I almost guarantee that buildOnCommit will be unsatisfactory.

This is one of those things that works fine for testing a small corpus but can fall over when you scale up.

As for why the suggester gets built in /tmp, perhaps Mike McCandless has magic to control that, nice find and thanks for sharing it!

Best,
Erick

On Wed, Dec 20, 2017 at 9:27 AM, Matthew Roth <[hidden email]> wrote:

> I have an incomplete solution. I was trying to build three suggester's
> at once. If I added the ?suggest.dictionary=<dict> parameter and built
> one at a time it worked out fine. However, this means I will need to
> set buildOnCommit and buildOnStartup to false. This is less than ideal.
> Building in a different directory would still be preferable.
>
>
> Best,
> Matt
>
> On Wed, Dec 20, 2017 at 12:05 PM, Matthew Roth <[hidden email]> wrote:
>
>> Hi List,
>>
>> I am building a few suggester's and I am receiving the error that I
>> have no space left on device.
>>
>>
>> <lst name="error">
>> <str name="msg">No space left on device</str> <str name="trace">
>> java.io.IOException: No space left on device at
>> sun.nio.ch.FileDispatcherImpl.write0(Native Method) at ...
>>
>>
>>
>> At first this threw me. df showed I had over 100 G free. the /data
>> dir the suggester is being constructed from is only 4G. On a
>> subsequent run I notice that the suggester is first being built in
>> /tmp. When setting up the LVM I only allotted 2g's to that directory
>> and I prefer to keep it that way. Is there a way to build the
>> suggester's in an alternative dir? I am not seeing anything in the
>> documentation (https://lucene.apache.org/
>> solr/guide/6_6/suggester.html)
>>
>> I should note that I am using solr 6.6.0
>>
>> Best,
>> Matt
>>