[lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
Any idea why I'm getting this error.

Error Invalid path: 'seg_9i/lextemp'
20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp'
20161205 184114 []  LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119
20161205 184114 []  S_lazy_init at core/Lucy/Index/PostingListWriter.c line 92


In another log file getting different error

Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir> /.lucyindex/1/schema_an.json' failed: Invalid argument
20161205 174146 []  LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c line 429

When committing the indexer object.


In both the case I'm seeing one common pattern that time is getting skewed up in the STDOUT log file by 5-6 hrs before starting the process on this file. In actual system time is not changed.

-Rajiv
Reply | Threaded
Open this post in threaded view
|

[lucy-user] RE: LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
I thought since I'm doing read and write together I may be getting file error so I tried to use FastUpdate method described here - http://lucy.apache.org/docs/perl/Lucy/Docs/Cookbook/FastUpdates.html

But now I'm more frequently getting below error.
Error input 57 too high
20161206 150630 []  S_fibonacci at core/Lucy/Index/IndexManager.c line 129

My use case is. While my application generating multiple logs, I'm indexing them parallelly. To achieve this I'm storing docs at multiple locations at each directory level. In a directory there could be multiple log files so for that directory I'm having one indexer directory. When file get closed I insert an end marker doc to indicate that indexing on that file is done. However, sometimes same file get open multiple times with additional data. In such case I search in existing indexing directory if there is any end marker is set, if there is end marker then I delete that end marker, and index additional data and again insert end marker. In this process I search as well as write at the same time.

I started seeing these type of error only after I inserted the logic of search and deleting. Should I catch this and retry? How many docs I should limit to commit together?

I will highly appreciate any help.

Thanks,
Rajiv Gupta

-----Original Message-----
From: Gupta, Rajiv [mailto:[hidden email]]
Sent: Tuesday, December 06, 2016 9:47 PM
To: [hidden email]
Subject: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Any idea why I'm getting this error.

Error Invalid path: 'seg_9i/lextemp'
20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp'
20161205 184114 []  LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119
20161205 184114 []  S_lazy_init at core/Lucy/Index/PostingListWriter.c line 92


In another log file getting different error

Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir> /.lucyindex/1/schema_an.json' failed: Invalid argument
20161205 174146 []  LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c line 429

When committing the indexer object.


In both the case I'm seeing one common pattern that time is getting skewed up in the STDOUT log file by 5-6 hrs before starting the process on this file. In actual system time is not changed.

-Rajiv
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] RE: LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Peter Karman
Gupta, Rajiv wrote on 12/6/16 3:16 PM:

> I thought since I'm doing read and write together I may be getting file error
> so I tried to use FastUpdate method described here -
> http://lucy.apache.org/docs/perl/Lucy/Docs/Cookbook/FastUpdates.html
>
> But now I'm more frequently getting below error. Error input 57 too high
> 20161206 150630 []  S_fibonacci at core/Lucy/Index/IndexManager.c line 129
>
> My use case is. While my application generating multiple logs, I'm indexing
> them parallelly. To achieve this I'm storing docs at multiple locations at
> each directory level. In a directory there could be multiple log files so for
> that directory I'm having one indexer directory. When file get closed I
> insert an end marker doc to indicate that indexing on that file is done.
> However, sometimes same file get open multiple times with additional data. In
> such case I search in existing indexing directory if there is any end marker
> is set, if there is end marker then I delete that end marker, and index
> additional data and again insert end marker. In this process I search as well
> as write at the same time.
>
> I started seeing these type of error only after I inserted the logic of
> search and deleting. Should I catch this and retry? How many docs I should
> limit to commit together?


It wasn't clear to me from your description, but you should be running only one
indexer at time per invindex.

You should also destroy any open searchers once the invindex changes (the
indexer commits).

You can see here how Dezi approaches this:

https://metacpan.org/source/KARMAN/Dezi-App-0.014/lib/Dezi/Lucy/Searcher.pm#L406

TL;DR Dezi keeps track of a invindex header file with a UUID in it, which
changes whenever the indexer finishes. Both the UUID and a md5 checksum of the
header file are checked on every search, and the searcher is destroyed and a new
one created if the old searcher is stale.


--
Peter Karman  .  http://peknet.com/  .  [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Nick Wellnhofer
In reply to this post by Gupta, Rajiv
On 06/12/2016 17:17, Gupta, Rajiv wrote:

> Any idea why I'm getting this error.
>
> Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 []  LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119
> 20161205 184114 []  S_lazy_init at core/Lucy/Index/PostingListWriter.c line 92
>
>
> In another log file getting different error
>
> Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir> /.lucyindex/1/schema_an.json' failed: Invalid argument
> 20161205 174146 []  LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c line 429
>
> When committing the indexer object.

This looks like two processes are writing to the index at once. This shouldn't
happen unless something with our locking mechanism is broken. Do you have an
unusual setup? Are you perhaps running on NFS?

> In both the case I'm seeing one common pattern that time is getting skewed up in the STDOUT log file by 5-6 hrs before starting the process on this file. In actual system time is not changed.

I don't fully understand this paragraph. Can you clarify?

Nick

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] RE: LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Nick Wellnhofer
In reply to this post by Gupta, Rajiv
On 06/12/2016 22:16, Gupta, Rajiv wrote:
> I thought since I'm doing read and write together I may be getting file error so I tried to use FastUpdate method described here - http://lucy.apache.org/docs/perl/Lucy/Docs/Cookbook/FastUpdates.html

You shouldn't get any filesystem errors when searching and indexing
simultaneously. The only error you might get under normal operation is lock
timeouts. In this case, you should consider fast updates.

But the errors you're seeing indicate a different problem that probably won't
be cured by fast updates.

> But now I'm more frequently getting below error.
> Error input 57 too high
> 20161206 150630 []  S_fibonacci at core/Lucy/Index/IndexManager.c line 129

This is caused by a known bug. Unfortunately, the fix wasn't committed when it
came up for the first time:

https://lists.apache.org/thread.html/0465759f6eae2108be30c70b490d0f94ab2b5c66bfac2b32c76eb41f@1362406649@%3Cuser.lucy.apache.org%3E

I'll make sure that the fix gets into the next release.

> How many docs I should limit to commit together?

If you're (re)indexing thousands of documents and don't want searchers to be
locked out, you should consider indexing batches of documents and sleep
between each batch to allow concurrent searches. I'd start with several
hundred documents per batch and and sleep for maybe 2 seconds. This thread
contains more details:

https://lists.apache.org/thread.html/0adbe8d0b5dd6c7491a8f28008428a39f485fae58ed606475f94c636@1355773975@%3Cuser.lucy.apache.org%3E

Nick

Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
In reply to this post by Nick Wellnhofer
Thanks Nick for your reply. Thanks Peter too.

This looks like two processes are writing to the index at once. This shouldn't happen unless something with our locking mechanism is broken. Do you have an unusual setup? Are you perhaps running on NFS?

Yes, I have an unusual setup. Let me try to explain the setup.

* My application is a test application. That runs too many test cases in parallel, which generates lot of log files. I'm using Lucy to index those log files for faster search, pagination and generating summary.
* From my application I kickoff a lucyindexer script using Open3 which is primarily responsible for indexing all the files while tests are progressing. The output & error of lucyindexer goes to STDOUT that is redirected to a log file.
* My application generates log files from 4 different sources. The information of all the log files that are newly created and end of files are stored in 4 different tables in our database.
* In my lucyindexer main module I use EV watchers. To monitor the tables I use EV::periodic (5sec) for new entries and completion of file (10sec), and EV:stat 1sec for file changes (however is just like periodic since that EV::stat won't work on NFS) and EV::IO to check the broken pipe so that I exit from indexer process once my test application ends.
*With each watcher when I get a new log files it follows following workflow. Scan through the file with very limited keywords, doing file open and reading line by line and create a Lucy doc base of defined regular expression. If it got the end time from db then insert another special doc end marker indicating the end of the file. That file gets removed from my list after adding end marker. End marker also stores the last line number and last seek pointer.  If no end time got for that file then it keeps 1 sec stat for new changes and add Lucy docs incrementally. With every new next file in I use Lucy Search to search if that file was opened previously, if I found that file name then I get its last line number and the seek pointer from the end marker. I delete that doc (end marker) using Lucy indexer delete by query and start reading that file for further changes. Once that file is aging closed I again insert the end marker. Once I get the pipe broken from my parent test application i keep buffer of 2 mins to insert end marker for in process  files.
* The index directory for all these files is under same folder name with name .lucyindexer/1 (I fixed it). There are multiple files in the same folder but it is rare (I never see it) that they conflicts in creating docs. Why I'm saying it is because one version of application is already out which is generating the docs however, it has a problem that when same file opens again it re-index it full which takes time and creates duplicates. That is the reason I tried to insert Search before adding doc for those files. I can also keep them in memory but since sometime list of file goes in 100k (for long running tests) the system get out of memory and become very slow.
* Indexer and log files are on NFS mount.
* I also observed that EV sometime getting premature ends (without calling break) but I'm not sure it is because of Indexer error. That time there is no error reported.
* In my Viewer application I run Forked LucySearch to consolidate data from all the folders. The list of folders sometime goes 1000. I used polysearcher but not found it faster than fork.

My Lucy library version is 0.4.2 I have asked my infra team to upgrade, which may take a month of so.  

Here is what happening in parallel most of the time.

Search->Found->delete doc->add doc->commit
add doc->commit

Thanks for reading till here. I'm open for any suggestions. I really liked this framework and see big opportunity in my company internal triaging strategy with linking it with product logs for more effective results.

You guys rock!

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Wednesday, December 07, 2016 4:46 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 06/12/2016 17:17, Gupta, Rajiv wrote:

> Any idea why I'm getting this error.
>
> Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 []  LUCY_Folder_Open_Out_IMP at
> core/Lucy/Store/Folder.c line 119
> 20161205 184114 []  S_lazy_init at core/Lucy/Index/PostingListWriter.c
> line 92
>
>
> In another log file getting different error
>
> Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir>
> /.lucyindex/1/schema_an.json' failed: Invalid argument
> 20161205 174146 []  LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c
> line 429
>
> When committing the indexer object.

This looks like two processes are writing to the index at once. This shouldn't happen unless something with our locking mechanism is broken. Do you have an unusual setup? Are you perhaps running on NFS?

> In both the case I'm seeing one common pattern that time is getting skewed up in the STDOUT log file by 5-6 hrs before starting the process on this file. In actual system time is not changed.

I don't fully understand this paragraph. Can you clarify?

Nick

Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
Adding more different type of errors and they all are indicating that indexer cannot update the doc. Now I have put the eval around indexer->commit to catch errors.

Can't delete 'documents.ix'
20161207 145559 [] *    S_do_consolidate at core/Lucy/Store/CompoundFileWriter.c line 173

Can't open '/u/smoke/presub/logs/cit-cr-setup-rtp.rajivg.1481116120.70557_cmode_1of1/010_cleanup/06_did_bad_happen/.lucyindex/1/seg_a9/documents.ix': Invalid argument
20161207 144109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118

What do you suggest should I put retry around it or apply FastUpdate mechanism.

Thanks,
Rajiv gupta


-----Original Message-----
From: Gupta, Rajiv
Sent: Wednesday, December 07, 2016 5:53 PM
To: [hidden email]
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Thanks Nick for your reply. Thanks Peter too.

This looks like two processes are writing to the index at once. This shouldn't happen unless something with our locking mechanism is broken. Do you have an unusual setup? Are you perhaps running on NFS?

Yes, I have an unusual setup. Let me try to explain the setup.

* My application is a test application. That runs too many test cases in parallel, which generates lot of log files. I'm using Lucy to index those log files for faster search, pagination and generating summary.
* From my application I kickoff a lucyindexer script using Open3 which is primarily responsible for indexing all the files while tests are progressing. The output & error of lucyindexer goes to STDOUT that is redirected to a log file.
* My application generates log files from 4 different sources. The information of all the log files that are newly created and end of files are stored in 4 different tables in our database.
* In my lucyindexer main module I use EV watchers. To monitor the tables I use EV::periodic (5sec) for new entries and completion of file (10sec), and EV:stat 1sec for file changes (however is just like periodic since that EV::stat won't work on NFS) and EV::IO to check the broken pipe so that I exit from indexer process once my test application ends.
*With each watcher when I get a new log files it follows following workflow. Scan through the file with very limited keywords, doing file open and reading line by line and create a Lucy doc base of defined regular expression. If it got the end time from db then insert another special doc end marker indicating the end of the file. That file gets removed from my list after adding end marker. End marker also stores the last line number and last seek pointer.  If no end time got for that file then it keeps 1 sec stat for new changes and add Lucy docs incrementally. With every new next file in I use Lucy Search to search if that file was opened previously, if I found that file name then I get its last line number and the seek pointer from the end marker. I delete that doc (end marker) using Lucy indexer delete by query and start reading that file for further changes. Once that file is aging closed I again insert the end marker. Once I get the pipe broken from my parent test application i keep buffer of 2 mins to insert end marker for in process  files.
* The index directory for all these files is under same folder name with name .lucyindexer/1 (I fixed it). There are multiple files in the same folder but it is rare (I never see it) that they conflicts in creating docs. Why I'm saying it is because one version of application is already out which is generating the docs however, it has a problem that when same file opens again it re-index it full which takes time and creates duplicates. That is the reason I tried to insert Search before adding doc for those files. I can also keep them in memory but since sometime list of file goes in 100k (for long running tests) the system get out of memory and become very slow.
* Indexer and log files are on NFS mount.
* I also observed that EV sometime getting premature ends (without calling break) but I'm not sure it is because of Indexer error. That time there is no error reported.
* In my Viewer application I run Forked LucySearch to consolidate data from all the folders. The list of folders sometime goes 1000. I used polysearcher but not found it faster than fork.

My Lucy library version is 0.4.2 I have asked my infra team to upgrade, which may take a month of so.  

Here is what happening in parallel most of the time.

Search->Found->delete doc->add doc->commit
add doc->commit

Thanks for reading till here. I'm open for any suggestions. I really liked this framework and see big opportunity in my company internal triaging strategy with linking it with product logs for more effective results.

You guys rock!

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Wednesday, December 07, 2016 4:46 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 06/12/2016 17:17, Gupta, Rajiv wrote:

> Any idea why I'm getting this error.
>
> Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 []  LUCY_Folder_Open_Out_IMP at
> core/Lucy/Store/Folder.c line 119
> 20161205 184114 []  S_lazy_init at core/Lucy/Index/PostingListWriter.c
> line 92
>
>
> In another log file getting different error
>
> Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir>
> /.lucyindex/1/schema_an.json' failed: Invalid argument
> 20161205 174146 []  LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c
> line 429
>
> When committing the indexer object.

This looks like two processes are writing to the index at once. This shouldn't happen unless something with our locking mechanism is broken. Do you have an unusual setup? Are you perhaps running on NFS?

> In both the case I'm seeing one common pattern that time is getting skewed up in the STDOUT log file by 5-6 hrs before starting the process on this file. In actual system time is not changed.

I don't fully understand this paragraph. Can you clarify?

Nick

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Nick Wellnhofer
In reply to this post by Gupta, Rajiv
On 07/12/2016 13:23, Gupta, Rajiv wrote:
> * Indexer and log files are on NFS mount.

Have you read and understood Lucy::Docs::FileLocking? With NFS, you have to
pass an IndexManager object to every indexer and searcher.

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

The fact that the index is on NFS probably also explains the performance
problems you reported earlier.

Nick

Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
I just now did that.

For my doc regular update, search and delete I'm using LightMergemanager with my host name. For adding end file marker I'm using regular manager with my host name added to conclude. I have also put retries around almost all commits where I was getting errors. Small runs were fine (there were any way fine) I have 5 large runs.

I will update the results here.

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Wednesday, December 07, 2016 9:38 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 07/12/2016 13:23, Gupta, Rajiv wrote:
> * Indexer and log files are on NFS mount.

Have you read and understood Lucy::Docs::FileLocking? With NFS, you have to pass an IndexManager object to every indexer and searcher.

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

The fact that the index is on NFS probably also explains the performance problems you reported earlier.

Nick

Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

-Rajiv

-----Original Message-----
From: Gupta, Rajiv [mailto:[hidden email]]
Sent: Wednesday, December 07, 2016 9:51 PM
To: [hidden email]
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

I just now did that.

For my doc regular update, search and delete I'm using LightMergemanager with my host name. For adding end file marker I'm using regular manager with my host name added to conclude. I have also put retries around almost all commits where I was getting errors. Small runs were fine (there were any way fine) I have 5 large runs.

I will update the results here.

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Wednesday, December 07, 2016 9:38 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 07/12/2016 13:23, Gupta, Rajiv wrote:
> * Indexer and log files are on NFS mount.

Have you read and understood Lucy::Docs::FileLocking? With NFS, you have to pass an IndexManager object to every indexer and searcher.

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

The fact that the index is on NFS probably also explains the performance problems you reported earlier.

Nick

Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
Fired multiple runs 10+ and all went through well except one where I got this error after replacing default manager to

manager =>  LightMergeManager->new( host => $self->{_hostname}."DEL"),

ERROR:
20161208 033346 [] * FAIL: FAILED AT initializing the IndexSearcher Couldn't get deletion lock
20161208 033346 [] *    lucy_PolyReader_do_open at core/Lucy/Index/PolyReader.c line 344
20161208 033346 [] *    at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3387.
20161208 033346 [] *    eval {...} called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3383
20161208 033346 [] *    eval {...} called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3381

All indexer operations except delete I wrapped around retries. So now I put retry against this as well.

However, the search is now slower (after adding PolyReader/IndexReader). I used PolyReader as in one of the forum it was mentioned that PolyReader has protection against some mem leak issue.

Any tips I can improve performance while using IndexReader?

Thanks much for all your support.

Thanks,
Rajiv Gupta


-----Original Message-----
From: Gupta, Rajiv [mailto:[hidden email]]
Sent: Wednesday, December 07, 2016 9:55 PM
To: [hidden email]
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

-Rajiv

-----Original Message-----
From: Gupta, Rajiv [mailto:[hidden email]]
Sent: Wednesday, December 07, 2016 9:51 PM
To: [hidden email]
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

I just now did that.

For my doc regular update, search and delete I'm using LightMergemanager with my host name. For adding end file marker I'm using regular manager with my host name added to conclude. I have also put retries around almost all commits where I was getting errors. Small runs were fine (there were any way fine) I have 5 large runs.

I will update the results here.

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Wednesday, December 07, 2016 9:38 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 07/12/2016 13:23, Gupta, Rajiv wrote:
> * Indexer and log files are on NFS mount.

Have you read and understood Lucy::Docs::FileLocking? With NFS, you have to pass an IndexManager object to every indexer and searcher.

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

The fact that the index is on NFS probably also explains the performance problems you reported earlier.

Nick

Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
I'm getting this error very frequently now :(

BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high
S_fibonacci at core/Lucy/Index/IndexManager.c line 129

Is there any workaround?

I'm using LightMergeManager I'm not sure if it is because of that. Should I stop that?

Please help. Very frequently I'm getting it now.

-----Original Message-----
From: Gupta, Rajiv [mailto:[hidden email]]
Sent: Thursday, December 08, 2016 4:00 PM
To: [hidden email]
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Fired multiple runs 10+ and all went through well except one where I got this error after replacing default manager to

manager =>  LightMergeManager->new( host => $self->{_hostname}."DEL"),

ERROR:
20161208 033346 [] * FAIL: FAILED AT initializing the IndexSearcher Couldn't get deletion lock
20161208 033346 [] *    lucy_PolyReader_do_open at core/Lucy/Index/PolyReader.c line 344
20161208 033346 [] *    at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3387.
20161208 033346 [] *    eval {...} called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3383
20161208 033346 [] *    eval {...} called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3381

All indexer operations except delete I wrapped around retries. So now I put retry against this as well.

However, the search is now slower (after adding PolyReader/IndexReader). I used PolyReader as in one of the forum it was mentioned that PolyReader has protection against some mem leak issue.

Any tips I can improve performance while using IndexReader?

Thanks much for all your support.

Thanks,
Rajiv Gupta


-----Original Message-----
From: Gupta, Rajiv [mailto:[hidden email]]
Sent: Wednesday, December 07, 2016 9:55 PM
To: [hidden email]
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

-Rajiv

-----Original Message-----
From: Gupta, Rajiv [mailto:[hidden email]]
Sent: Wednesday, December 07, 2016 9:51 PM
To: [hidden email]
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

I just now did that.

For my doc regular update, search and delete I'm using LightMergemanager with my host name. For adding end file marker I'm using regular manager with my host name added to conclude. I have also put retries around almost all commits where I was getting errors. Small runs were fine (there were any way fine) I have 5 large runs.

I will update the results here.

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Wednesday, December 07, 2016 9:38 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 07/12/2016 13:23, Gupta, Rajiv wrote:
> * Indexer and log files are on NFS mount.

Have you read and understood Lucy::Docs::FileLocking? With NFS, you have to pass an IndexManager object to every indexer and searcher.

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

The fact that the index is on NFS probably also explains the performance problems you reported earlier.

Nick

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Nick Wellnhofer
On 09/12/2016 15:01, Gupta, Rajiv wrote:

> I'm getting this error very frequently now :(
>
> BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high
> S_fibonacci at core/Lucy/Index/IndexManager.c line 129
>
> Is there any workaround?
>
> I'm using LightMergeManager I'm not sure if it is because of that. Should I stop that?
>
> Please help. Very frequently I'm getting it now.

I committed a fix to the 0.4, 0.5, and 0.6 branches. Your best option is to
get one of these branches with Git and recompile Lucy. If you can't do that,
either stop using LightMergeManager, or try the following untested workaround.

Modify LightMergeManager to not call SUPER::recycle:

     package LightMergeManager;
     use base qw( Lucy::Index::IndexManager );

     sub recycle {
         my ( $self, %args ) = @_;
         my $seg_readers = $args{reader}->get_seg_readers;
         @$seg_readers = grep { $_->doc_max < 10 } @$seg_readers;
         return $seg_readers;
     }

Make BackgroundMerger always "optimize" the index before committing:

     $bg_merger->optimize;
     $bg_merger->commit;

> However, the search is now slower (after adding PolyReader/IndexReader). I used PolyReader as in one of the forum it was mentioned that PolyReader has protection against some mem leak issue.
>
> Any tips I can improve performance while using IndexReader?

Using PolyReader or IndexReader shouldn't make a difference performance-wise.
The performance drop is probably caused by supplying an IndexManager to
IndexReader or PolyReader which results in additional overhead from read
locks. You should move the index to a local filesystem if you're concerned
about performance.

> However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

No, simply use the hostname without a suffix.

Nick
Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
Thanks Nick for your help and workaround. I will ask my infra team to pick up latest 0.6 and install. I hope 0.6 works out better than 0.4.

I stopped using LightMergeManager and I did not get that error any more however now performance more sucks. I'm going to try few things now:

1. Try the workaround provided by you. (I don't use background merger)
2. Try to use background merge in an another loop with above option.
2. Try to store information in-memory/storable/db instead of using search everytime. I think when I'm merging search with doc indexing under same process it is creating problems. If other system using search I don't see any problem.
3. Try to serialize the index directories to avoid overlap anyway they all are running as parallel process.  

Hope one of above should work out.

Thanks,
Rajiv

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Friday, December 09, 2016 8:51 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 09/12/2016 15:01, Gupta, Rajiv wrote:

> I'm getting this error very frequently now :(
>
> BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high
> S_fibonacci at core/Lucy/Index/IndexManager.c line 129
>
> Is there any workaround?
>
> I'm using LightMergeManager I'm not sure if it is because of that. Should I stop that?
>
> Please help. Very frequently I'm getting it now.

I committed a fix to the 0.4, 0.5, and 0.6 branches. Your best option is to get one of these branches with Git and recompile Lucy. If you can't do that, either stop using LightMergeManager, or try the following untested workaround.

Modify LightMergeManager to not call SUPER::recycle:

     package LightMergeManager;
     use base qw( Lucy::Index::IndexManager );

     sub recycle {
         my ( $self, %args ) = @_;
         my $seg_readers = $args{reader}->get_seg_readers;
         @$seg_readers = grep { $_->doc_max < 10 } @$seg_readers;
         return $seg_readers;
     }

Make BackgroundMerger always "optimize" the index before committing:

     $bg_merger->optimize;
     $bg_merger->commit;

> However, the search is now slower (after adding PolyReader/IndexReader). I used PolyReader as in one of the forum it was mentioned that PolyReader has protection against some mem leak issue.
>
> Any tips I can improve performance while using IndexReader?

Using PolyReader or IndexReader shouldn't make a difference performance-wise.
The performance drop is probably caused by supplying an IndexManager to IndexReader or PolyReader which results in additional overhead from read locks. You should move the index to a local filesystem if you're concerned about performance.

> However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

No, simply use the hostname without a suffix.

Nick
Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
I tried with below workaround now I got below error at multiple places where adding doc. It bailed out after 20 retries (default) without adding any doc.

:  input 51 too high
20161210 064458 [] *    S_fibonacci at core/Lucy/Index/IndexManager.c line 129

Am I doing something wrong? The approach I'm following is the while reading the document use LightMergeManager and one it comes to end of file use regular index manger to do the final commit. In some cases the number of docs buffered using LightMergeManger becoming too high.  I can also put the logic that if number of docs buffered cross certain limit call Index Manager commit.

Thanks,
Rajiv Gupta

-----Original Message-----
From: Gupta, Rajiv [mailto:[hidden email]]
Sent: Friday, December 09, 2016 11:47 PM
To: [hidden email]
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Thanks Nick for your help and workaround. I will ask my infra team to pick up latest 0.6 and install. I hope 0.6 works out better than 0.4.

I stopped using LightMergeManager and I did not get that error any more however now performance more sucks. I'm going to try few things now:

1. Try the workaround provided by you. (I don't use background merger) 2. Try to use background merge in an another loop with above option.
2. Try to store information in-memory/storable/db instead of using search everytime. I think when I'm merging search with doc indexing under same process it is creating problems. If other system using search I don't see any problem.
3. Try to serialize the index directories to avoid overlap anyway they all are running as parallel process.  

Hope one of above should work out.

Thanks,
Rajiv

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Friday, December 09, 2016 8:51 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 09/12/2016 15:01, Gupta, Rajiv wrote:

> I'm getting this error very frequently now :(
>
> BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high
> S_fibonacci at core/Lucy/Index/IndexManager.c line 129
>
> Is there any workaround?
>
> I'm using LightMergeManager I'm not sure if it is because of that. Should I stop that?
>
> Please help. Very frequently I'm getting it now.

I committed a fix to the 0.4, 0.5, and 0.6 branches. Your best option is to get one of these branches with Git and recompile Lucy. If you can't do that, either stop using LightMergeManager, or try the following untested workaround.

Modify LightMergeManager to not call SUPER::recycle:

     package LightMergeManager;
     use base qw( Lucy::Index::IndexManager );

     sub recycle {
         my ( $self, %args ) = @_;
         my $seg_readers = $args{reader}->get_seg_readers;
         @$seg_readers = grep { $_->doc_max < 10 } @$seg_readers;
         return $seg_readers;
     }

Make BackgroundMerger always "optimize" the index before committing:

     $bg_merger->optimize;
     $bg_merger->commit;

> However, the search is now slower (after adding PolyReader/IndexReader). I used PolyReader as in one of the forum it was mentioned that PolyReader has protection against some mem leak issue.
>
> Any tips I can improve performance while using IndexReader?

Using PolyReader or IndexReader shouldn't make a difference performance-wise.
The performance drop is probably caused by supplying an IndexManager to IndexReader or PolyReader which results in additional overhead from read locks. You should move the index to a local filesystem if you're concerned about performance.

> However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

No, simply use the hostname without a suffix.

Nick
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Nick Wellnhofer
On 10/12/2016 15:26, Gupta, Rajiv wrote:
> I will ask my infra team to pick up latest 0.6 and install. I hope 0.6 works out better than 0.4.

Note that the fix to IndexManager isn't in any released version of Lucy yet.
You'll get the same error with 0.6.0. Either compile from the 0.6 Git branch
or wait until 0.6.1 is released.

Nick
Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
Any timeline when 0.6.1 will be released?

-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Saturday, December 10, 2016 9:37 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 10/12/2016 15:26, Gupta, Rajiv wrote:
> I will ask my infra team to pick up latest 0.6 and install. I hope 0.6 works out better than 0.4.

Note that the fix to IndexManager isn't in any released version of Lucy yet.
You'll get the same error with 0.6.0. Either compile from the 0.6 Git branch or wait until 0.6.1 is released.

Nick
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Nick Wellnhofer
On 10/12/2016 17:25, Gupta, Rajiv wrote:
> Any timeline when 0.6.1 will be released?

0.6.1 is on schedule to be released in a few days.

Nick

Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
Can't open '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1/seg_fd/lexicon-7.ixix': Invalid argument
20161211 182109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118
20161211 182109 [] *    LUCY_Folder_Local_Open_In_IMP at core/Lucy/Store/Folder.c line 101
20161211 182109 [] *    LUCY_Folder_Open_In_IMP at core/Lucy/Store/Folder.c line 75

There are two more failures they also failed due so similar reasons

rename from '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema.temp' to '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory

Can't delete 'lexicon-3.ix'

I believe all three are related to race condition while doing parallel indexing and should go away with retries. However my retries started failing with different error which is strange to me as if directory already exists shouldn't it skip from create attempt.

20161211 182109 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1 :  Couldn't create directory '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1': No such file or directory
20161211 182109 [] *    LUCY_FSFolder_Initialize_IMP at core/Lucy/Store/FSFolder.c line 102

So my all retry attempts were also failed.


Now I have put one more condition on before Index creation if directory exists or not before retry :(

My failure rate is now 7/10. Target to achieve at least 9/10.

-Rajiv


-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Sunday, December 11, 2016 3:58 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 10/12/2016 17:25, Gupta, Rajiv wrote:
> Any timeline when 0.6.1 will be released?

0.6.1 is on schedule to be released in a few days.

Nick

Reply | Threaded
Open this post in threaded view
|

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Gupta, Rajiv
Another thing I could not able to figure out how the time zone is getting changed after the failure.

20161211 125058 [] [(_start_indexing_file) File to index: /u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/001_check_log_size.log Save Index location: /u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1 FileSeek pointer start : 17794 Final Flag: 1
20161211 175059 [] ****************************************
20161211 175059 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1 :  rename from '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema.temp' to '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory
20161211 175059 [] *    LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c line 429
20161211 175059 [] *    at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3278, <$fhlogfile> line 46.
20161211 175059 [] *    eval {...} called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3265
20161211 175059 [] *    NATE::LucyIndexerUtils::_lucy_add_doc('NATE::LucyIndexerUtils=HASH(0x2917050)', 'HASH(0x3116398)') called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3169

It is exactly shifting by 5 hrs. from EST. These logs are nothing but the STDOUT moving to one file and I'm using localtime(time()) and format it to generate the time string.

Thanks,
Rajiv Gutpta



-----Original Message-----
From: Gupta, Rajiv
Sent: Monday, December 12, 2016 10:08 AM
To: [hidden email]
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Can't open '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1/seg_fd/lexicon-7.ixix': Invalid argument
20161211 182109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118
20161211 182109 [] *    LUCY_Folder_Local_Open_In_IMP at core/Lucy/Store/Folder.c line 101
20161211 182109 [] *    LUCY_Folder_Open_In_IMP at core/Lucy/Store/Folder.c line 75

There are two more failures they also failed due so similar reasons

rename from '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema.temp' to '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory

Can't delete 'lexicon-3.ix'

I believe all three are related to race condition while doing parallel indexing and should go away with retries. However my retries started failing with different error which is strange to me as if directory already exists shouldn't it skip from create attempt.

20161211 182109 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1 :  Couldn't create directory '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1': No such file or directory
20161211 182109 [] *    LUCY_FSFolder_Initialize_IMP at core/Lucy/Store/FSFolder.c line 102

So my all retry attempts were also failed.


Now I have put one more condition on before Index creation if directory exists or not before retry :(

My failure rate is now 7/10. Target to achieve at least 9/10.

-Rajiv


-----Original Message-----
From: Nick Wellnhofer [mailto:[hidden email]]
Sent: Sunday, December 11, 2016 3:58 PM
To: [hidden email]
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 10/12/2016 17:25, Gupta, Rajiv wrote:
> Any timeline when 0.6.1 will be released?

0.6.1 is on schedule to be released in a few days.

Nick

12