[lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
After I create directory by myself I'm getting this error:

20161213 164633 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481639632.57959_cmode_1of1/010_cleanup/06_did_bad_happen/.lucyindex/1 :  Folder '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481639632.57959_cmode_1of1/010_cleanup/06_did_bad_happen/.lucyindex/1' failed check
20161213 164633 [] *    S_init_folder at core/Lucy/Index/Indexer.c line 263
20161213 164633 [] *    at /usr/software/lib/perl5/site_perl/5.14.0/x86_64-linux-thread-multi/Lucy.pm line 122.

Please help I'm badly struck now. This error is intermittent I see mostly when NFS is loaded. Ideally retry should work but that is also not working. I also tried to limit the number of files in a directories to be scanned to 10 by putting a hack in my code but that is also not working.

Thanks,
Rajiv Gupta

-----Original Message-----
From: Gupta, Rajiv [mailto:[hidden email]]
Sent: Monday, December 12, 2016 10:08 AM
To: [hidden email]
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Can't open '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1/seg_fd/lexicon-7.ixix': Invalid argument
20161211 182109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118
20161211 182109 [] *    LUCY_Folder_Local_Open_In_IMP at core/Lucy/Store/Folder.c line 101
20161211 182109 [] *    LUCY_Folder_Open_In_IMP at core/Lucy/Store/Folder.c line 75

There are two more failures they also failed due so similar reasons

rename from '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema.temp' to '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory

Can't delete 'lexicon-3.ix'

I believe all three are related to race condition while doing parallel indexing and should go away with retries. However my retries started failing with different error which is strange to me as if directory already exists shouldn't it skip from create attempt.

20161211 182109 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1 :  Couldn't create directory '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1': No such file or directory
20161211 182109 [] *    LUCY_FSFolder_Initialize_IMP at core/Lucy/Store/FSFolder.c line 102

So my all retry attempts were also failed.


Now I have put one more condition on before Index creation if directory exists or not before retry :(

My failure rate is now 7/10. Target to achieve at least 9/10.

-Rajiv


-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Sunday, December 11, 2016 3:58 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 10/12/2016 17:25, Gupta, Rajiv wrote:
> Any timeline when 0.6.1 will be released?

0.6.1 is on schedule to be released in a few days.

Nick

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Nick Wellnhofer
On 13/12/2016 18:05, Gupta, Rajiv wrote:
> After I create directory by myself I'm getting this error:

Which directory do you try to create? I wouldn't try to make manual changes
inside Lucy's index directory. This will only make things worse.

> Can't open '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1/seg_fd/lexicon-7.ixix': Invalid argument
> 20161211 182109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118
> 20161211 182109 [] *    LUCY_Folder_Local_Open_In_IMP at core/Lucy/Store/Folder.c line 101
> 20161211 182109 [] *    LUCY_Folder_Open_In_IMP at core/Lucy/Store/Folder.c line 75
>
> There are two more failures they also failed due so similar reasons
>
> rename from '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema.temp' to '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory
>
> Can't delete 'lexicon-3.ix'
>
> I believe all three are related to race condition while doing parallel indexing and should go away with retries. However my retries started failing with different error which is strange to me as if directory already exists shouldn't it skip from create attempt.
>
> 20161211 182109 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1 :  Couldn't create directory '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1': No such file or directory
> 20161211 182109 [] *    LUCY_FSFolder_Initialize_IMP at core/Lucy/Store/FSFolder.c line 102
>
> So my all retry attempts were also failed.

These errors still look like multiple processes are modifying the index at the
same time. Are you really sure that every indexer is created with an
IndexManager and that every IndexManager is created with a `host` argument
that is unique to each machine?

All these errors mean that there's something fundamentally wrong with your
code or that you hit a bug in Lucy. The only type of error where it makes
sense to retry is LockErr. All other errors are mostly fatal and could result
in index corruption. Retrying will only mask an underlying problem in this case.

Unfortunately, I'm unable to help unless you provide some kind of
self-contained, reproducible test case. I'm aware that this isn't easy,
especially with multiple clients writing to a shared volume.

As I already hinted at, you might want to reconsider your architecture and use
some kind of search server that uses an index on a local filesystem. There are
ready-made platforms on top of Lucy like Dezi, but it isn't too hard to roll
your own solution. This should result in better performance and makes behavior
of your code more predictable.

Nick

Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
Thanks Nick for your reply and taking time on this. One quick question before you lost on below email. In release 0.6.1 we have fix for below bug right?

> BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high
> S_fibonacci at core/Lucy/Index/IndexManager.c line 129

Thanks,
Rajiv g

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Saturday, December 17, 2016 2:52 AM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 13/12/2016 18:05, Gupta, Rajiv wrote:
> After I create directory by myself I'm getting this error:

Which directory do you try to create? I wouldn't try to make manual changes inside Lucy's index directory. This will only make things worse.

        $indexer = Lucy::Index::Indexer->new(
                index    => $saveindexlocation,
                schema   => $schema,
                manager  => Lucy::Index::IndexManager->new(host=>$self->{_hostname}),
                create   => $dir_create_flag,
                truncate => 0,
            );

The "create" flag initially set to 1 so that $saveindexlocation can get created after I got the error I make sure directory is created and made create flag always 0.

> Can't open '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1/seg_fd/lexicon-7.ixix': Invalid argument
> 20161211 182109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118
> 20161211 182109 [] *    LUCY_Folder_Local_Open_In_IMP at core/Lucy/Store/Folder.c line 101
> 20161211 182109 [] *    LUCY_Folder_Open_In_IMP at core/Lucy/Store/Folder.c line 75
>
> There are two more failures they also failed due so similar reasons
>
> rename from
> '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode
> _1of1/.lucyindex/1/schema.temp' to
> '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode
> _1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory
>
> Can't delete 'lexicon-3.ix'
>
> I believe all three are related to race condition while doing parallel indexing and should go away with retries. However my retries started failing with different error which is strange to me as if directory already exists shouldn't it skip from create attempt.
>
> 20161211 182109 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1 :  Couldn't create directory '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1': No such file or directory
> 20161211 182109 [] *    LUCY_FSFolder_Initialize_IMP at core/Lucy/Store/FSFolder.c line 102
>
> So my all retry attempts were also failed.

These errors still look like multiple processes are modifying the index at the same time. Are you really sure that every indexer is created with an IndexManager and that every IndexManager is created with a `host` argument that is unique to each machine?

Rajiv>>>All parallel processes are child process of one process and running from the same host. Would you think giving host name uniqueness with some random number would help for multiple processes.

All these errors mean that there's something fundamentally wrong with your code or that you hit a bug in Lucy. The only type of error where it makes sense to retry is LockErr. All other errors are mostly fatal and could result in index corruption. Retrying will only mask an underlying problem in this case.

Unfortunately, I'm unable to help unless you provide some kind of self-contained, reproducible test case. I'm aware that this isn't easy, especially with multiple clients writing to a shared volume.

As I already hinted at, you might want to reconsider your architecture and use some kind of search server that uses an index on a local filesystem. There are ready-made platforms on top of Lucy like Dezi, but it isn't too hard to roll your own solution. This should result in better performance and makes behavior of your code more predictable.

Rajiv>>> Going to local file system is not possible for my case. This is a test framework that generate lot of logs and I'm doing indexing per test runs and all these logs needs to be on shared volume for other triaging purpose. The next thing I'm going to try is create a watcher per directory and index all files under that directory serially. Currently I'm creating watchers on all the files and some time multiple files in the same directory may try to get indexed at the same time.  And as you stated this might be the issue. I'm not sure how it will perform with the current time limits. Creating Indexer manager adding overhead to the search process.

Nick

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Nick Wellnhofer
On 19/12/2016 04:21, Gupta, Rajiv wrote:
> In release 0.6.1 we have fix for below bug right?
>
>> BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high
>> S_fibonacci at core/Lucy/Index/IndexManager.c line 129

Yes, this is fixed in 0.6.1.

Nick

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Nick Wellnhofer
In reply to this post by Gupta, Rajiv
On 19/12/2016 04:21, Gupta, Rajiv wrote:
> Rajiv>>>All parallel processes are child process of one process and running from the same host. Would you think giving host name uniqueness with some random number would help for multiple processes.

If you access an index on a shared volume only from a single host, there's
actually no need to set a hostname at all, although it's good practice. It's
all explained in Lucy::Docs::FileLocking:

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

But you should never use different or even random `host` values on the same
machine. This can lead to stale lock files not being deleted after a crash.

> Rajiv>>> Going to local file system is not possible for my case. This is a test framework that generate lot of logs and I'm doing indexing per test runs and all these logs needs to be on shared volume for other triaging purpose.

It doesn't matter where the log files are. I'm talking about the location of
your Lucy index directory.

> The next thing I'm going to try is create a watcher per directory and index all files under that directory serially. Currently I'm creating watchers on all the files and some time multiple files in the same directory may try to get indexed at the same time.  And as you stated this might be the issue. I'm not sure how it will perform with the current time limits.

By Lucy's design, indexing files in parallel shouldn't cause any problems,
especially if it all happens on a single machine. The worst thing that could
happen are lock errors which can be addressed by changing timeouts or
retrying. But without code to reproduce the problem, I can't tell whether it's
a Lucy bug.

If you can't provide a test case, it's a good idea to test whether the
problems are caused by parallel indexing at all. I'd also try to move your
indices to a local file system to see whether it makes a difference.

> Creating Indexer manager adding overhead to the search process.

You only have to use IndexManagers for searchers to avoid errors like "Stale
NFS filehandle". If you have another way to handle such errors, there might be
no need for IndexManagers at all. Again, see Lucy::Docs:FileLocking.

Nick

Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
Till now we are under the impression of - http://lucene.472066.n3.nabble.com/lucy-user-Parallel-indexing-in-same-index-dir-using-Lucy-td4160395.html so avoiding any kind of parallel indexing.

Let me know your thoughts on this approach. Run all indexing in parallel and save indexes at /tmp (local fs location) and periodically copy it to shared location. Why to copy because from servers where I'm performing search need access to the indexes. Insertion will happen only from one server however searches can be performed from different servers using indexed data.

-Rajiv

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Monday, December 19, 2016 7:09 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 19/12/2016 04:21, Gupta, Rajiv wrote:
> Rajiv>>>All parallel processes are child process of one process and running from the same host. Would you think giving host name uniqueness with some random number would help for multiple processes.

If you access an index on a shared volume only from a single host, there's actually no need to set a hostname at all, although it's good practice. It's all explained in Lucy::Docs::FileLocking:

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

But you should never use different or even random `host` values on the same machine. This can lead to stale lock files not being deleted after a crash.

> Rajiv>>> Going to local file system is not possible for my case. This is a test framework that generate lot of logs and I'm doing indexing per test runs and all these logs needs to be on shared volume for other triaging purpose.

It doesn't matter where the log files are. I'm talking about the location of your Lucy index directory.

> The next thing I'm going to try is create a watcher per directory and index all files under that directory serially. Currently I'm creating watchers on all the files and some time multiple files in the same directory may try to get indexed at the same time.  And as you stated this might be the issue. I'm not sure how it will perform with the current time limits.

By Lucy's design, indexing files in parallel shouldn't cause any problems, especially if it all happens on a single machine. The worst thing that could happen are lock errors which can be addressed by changing timeouts or retrying. But without code to reproduce the problem, I can't tell whether it's a Lucy bug.

If you can't provide a test case, it's a good idea to test whether the problems are caused by parallel indexing at all. I'd also try to move your indices to a local file system to see whether it makes a difference.

> Creating Indexer manager adding overhead to the search process.

You only have to use IndexManagers for searchers to avoid errors like "Stale NFS filehandle". If you have another way to handle such errors, there might be no need for IndexManagers at all. Again, see Lucy::Docs:FileLocking.

Nick

Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
I think you may not have liked the approach :(

However,  I tried that and it seems working fine. I gave 20+ big runs and they all seems went through.

Just checking should I use raw copy or is there better way to copy indexes without losing any transit data, such as ($indexer->add_index($index);)

Thanks,
Rajiv Gupta

-----Original Message-----
From: Gupta, Rajiv [mailto:[hidden email]]
Sent: Monday, January 02, 2017 7:47 PM
To: [hidden email]
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Till now we are under the impression of - http://lucene.472066.n3.nabble.com/lucy-user-Parallel-indexing-in-same-index-dir-using-Lucy-td4160395.html so avoiding any kind of parallel indexing.

Let me know your thoughts on this approach. Run all indexing in parallel and save indexes at /tmp (local fs location) and periodically copy it to shared location. Why to copy because from servers where I'm performing search need access to the indexes. Insertion will happen only from one server however searches can be performed from different servers using indexed data.

-Rajiv

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Monday, December 19, 2016 7:09 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 19/12/2016 04:21, Gupta, Rajiv wrote:
> Rajiv>>>All parallel processes are child process of one process and running from the same host. Would you think giving host name uniqueness with some random number would help for multiple processes.

If you access an index on a shared volume only from a single host, there's actually no need to set a hostname at all, although it's good practice. It's all explained in Lucy::Docs::FileLocking:

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

But you should never use different or even random `host` values on the same machine. This can lead to stale lock files not being deleted after a crash.

> Rajiv>>> Going to local file system is not possible for my case. This is a test framework that generate lot of logs and I'm doing indexing per test runs and all these logs needs to be on shared volume for other triaging purpose.

It doesn't matter where the log files are. I'm talking about the location of your Lucy index directory.

> The next thing I'm going to try is create a watcher per directory and index all files under that directory serially. Currently I'm creating watchers on all the files and some time multiple files in the same directory may try to get indexed at the same time.  And as you stated this might be the issue. I'm not sure how it will perform with the current time limits.

By Lucy's design, indexing files in parallel shouldn't cause any problems, especially if it all happens on a single machine. The worst thing that could happen are lock errors which can be addressed by changing timeouts or retrying. But without code to reproduce the problem, I can't tell whether it's a Lucy bug.

If you can't provide a test case, it's a good idea to test whether the problems are caused by parallel indexing at all. I'd also try to move your indices to a local file system to see whether it makes a difference.

> Creating Indexer manager adding overhead to the search process.

You only have to use IndexManagers for searchers to avoid errors like "Stale NFS filehandle". If you have another way to handle such errors, there might be no need for IndexManagers at all. Again, see Lucy::Docs:FileLocking.

Nick

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Peter Karman
I use rsync to copy indexes from one machine to another. Copy probably
works too.

Another approach is to have a single indexer and some kind of queue, so
that separate worker machines can push documents-to-be-indexed to the queue
and the indexer runs periodically to injest them. Same idea, but
performance may vary depending on the number of workers and frequency of
updates.

On Wed, Jan 4, 2017 at 8:22 AM, Gupta, Rajiv <[hidden email]> wrote:

> I think you may not have liked the approach :(
>
> However,  I tried that and it seems working fine. I gave 20+ big runs and
> they all seems went through.
>
> Just checking should I use raw copy or is there better way to copy indexes
> without losing any transit data, such as ($indexer->add_index($index);)
>
> Thanks,
> Rajiv Gupta
>
> -----Original Message-----
> From: Gupta, Rajiv [mailto:[hidden email]]
> Sent: Monday, January 02, 2017 7:47 PM
> To: [hidden email]
> Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at
> core/Lucy/Store/Folder.c line 119
>
> Till now we are under the impression of - http://lucene.472066.n3.
> nabble.com/lucy-user-Parallel-indexing-in-same-index-dir-
> using-Lucy-td4160395.html so avoiding any kind of parallel indexing.
>
> Let me know your thoughts on this approach. Run all indexing in parallel
> and save indexes at /tmp (local fs location) and periodically copy it to
> shared location. Why to copy because from servers where I'm performing
> search need access to the indexes. Insertion will happen only from one
> server however searches can be performed from different servers using
> indexed data.
>
> -Rajiv
>
> -----Original Message-----
> From: Nick Wellnhofer [mailto:[hidden email]]
> Sent: Monday, December 19, 2016 7:09 PM
> To: [hidden email]
> Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at
> core/Lucy/Store/Folder.c line 119
>
> On 19/12/2016 04:21, Gupta, Rajiv wrote:
> > Rajiv>>>All parallel processes are child process of one process and
> running from the same host. Would you think giving host name uniqueness
> with some random number would help for multiple processes.
>
> If you access an index on a shared volume only from a single host, there's
> actually no need to set a hostname at all, although it's good practice.
> It's all explained in Lucy::Docs::FileLocking:
>
>      http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html
>
> But you should never use different or even random `host` values on the
> same machine. This can lead to stale lock files not being deleted after a
> crash.
>
> > Rajiv>>> Going to local file system is not possible for my case. This is
> a test framework that generate lot of logs and I'm doing indexing per test
> runs and all these logs needs to be on shared volume for other triaging
> purpose.
>
> It doesn't matter where the log files are. I'm talking about the location
> of your Lucy index directory.
>
> > The next thing I'm going to try is create a watcher per directory and
> index all files under that directory serially. Currently I'm creating
> watchers on all the files and some time multiple files in the same
> directory may try to get indexed at the same time.  And as you stated this
> might be the issue. I'm not sure how it will perform with the current time
> limits.
>
> By Lucy's design, indexing files in parallel shouldn't cause any problems,
> especially if it all happens on a single machine. The worst thing that
> could happen are lock errors which can be addressed by changing timeouts or
> retrying. But without code to reproduce the problem, I can't tell whether
> it's a Lucy bug.
>
> If you can't provide a test case, it's a good idea to test whether the
> problems are caused by parallel indexing at all. I'd also try to move your
> indices to a local file system to see whether it makes a difference.
>
> > Creating Indexer manager adding overhead to the search process.
>
> You only have to use IndexManagers for searchers to avoid errors like
> "Stale NFS filehandle". If you have another way to handle such errors,
> there might be no need for IndexManagers at all. Again, see
> Lucy::Docs:FileLocking.
>
> Nick
>
>


--
Peter Karman . [hidden email] . http://peknet.com/
Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
Thanks Peter. For now I'm using copydir. Not seen any problem so far except indexes are not available during copy which is expected and for that I have put the retry.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Peter Karman
Sent: Wednesday, January 04, 2017 8:31 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

I use rsync to copy indexes from one machine to another. Copy probably works too.

Another approach is to have a single indexer and some kind of queue, so that separate worker machines can push documents-to-be-indexed to the queue and the indexer runs periodically to injest them. Same idea, but performance may vary depending on the number of workers and frequency of updates.

On Wed, Jan 4, 2017 at 8:22 AM, Gupta, Rajiv <[hidden email]> wrote:

> I think you may not have liked the approach :(
>
> However,  I tried that and it seems working fine. I gave 20+ big runs
> and they all seems went through.
>
> Just checking should I use raw copy or is there better way to copy
> indexes without losing any transit data, such as
> ($indexer->add_index($index);)
>
> Thanks,
> Rajiv Gupta
>
> -----Original Message-----
> From: Gupta, Rajiv [mailto:[hidden email]]
> Sent: Monday, January 02, 2017 7:47 PM
> To: [hidden email]
> Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at
> core/Lucy/Store/Folder.c line 119
>
> Till now we are under the impression of - http://lucene.472066.n3.
> nabble.com/lucy-user-Parallel-indexing-in-same-index-dir-
> using-Lucy-td4160395.html so avoiding any kind of parallel indexing.
>
> Let me know your thoughts on this approach. Run all indexing in
> parallel and save indexes at /tmp (local fs location) and periodically
> copy it to shared location. Why to copy because from servers where I'm
> performing search need access to the indexes. Insertion will happen
> only from one server however searches can be performed from different
> servers using indexed data.
>
> -Rajiv
>
> -----Original Message-----
> From: Nick Wellnhofer [mailto:[hidden email]]
> Sent: Monday, December 19, 2016 7:09 PM
> To: [hidden email]
> Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at
> core/Lucy/Store/Folder.c line 119
>
> On 19/12/2016 04:21, Gupta, Rajiv wrote:
> > Rajiv>>>All parallel processes are child process of one process and
> running from the same host. Would you think giving host name
> uniqueness with some random number would help for multiple processes.
>
> If you access an index on a shared volume only from a single host,
> there's actually no need to set a hostname at all, although it's good practice.
> It's all explained in Lucy::Docs::FileLocking:
>
>      http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html
>
> But you should never use different or even random `host` values on the
> same machine. This can lead to stale lock files not being deleted
> after a crash.
>
> > Rajiv>>> Going to local file system is not possible for my case.
> > Rajiv>>> This is
> a test framework that generate lot of logs and I'm doing indexing per
> test runs and all these logs needs to be on shared volume for other
> triaging purpose.
>
> It doesn't matter where the log files are. I'm talking about the
> location of your Lucy index directory.
>
> > The next thing I'm going to try is create a watcher per directory
> > and
> index all files under that directory serially. Currently I'm creating
> watchers on all the files and some time multiple files in the same
> directory may try to get indexed at the same time.  And as you stated
> this might be the issue. I'm not sure how it will perform with the
> current time limits.
>
> By Lucy's design, indexing files in parallel shouldn't cause any
> problems, especially if it all happens on a single machine. The worst
> thing that could happen are lock errors which can be addressed by
> changing timeouts or retrying. But without code to reproduce the
> problem, I can't tell whether it's a Lucy bug.
>
> If you can't provide a test case, it's a good idea to test whether the
> problems are caused by parallel indexing at all. I'd also try to move
> your indices to a local file system to see whether it makes a difference.
>
> > Creating Indexer manager adding overhead to the search process.
>
> You only have to use IndexManagers for searchers to avoid errors like
> "Stale NFS filehandle". If you have another way to handle such errors,
> there might be no need for IndexManagers at all. Again, see
> Lucy::Docs:FileLocking.
>
> Nick
>
>


--
Peter Karman . [hidden email] . http://peknet.com/
12