[lucy-dev] Improving Lucy's locking code

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[lucy-dev] Improving Lucy's locking code

Nick Wellnhofer
Lucifers,

I had a close look at Lucy's locking code and while I didn't find any glaring
problems, I think it could be improved in some places. First of all, I'd like
to use a locking mechanism provided by the OS instead of rolling our own. The
main upside is that we can detect reliably whether a process holding a lock
has crashed without relying on a heuristic. On UNIX there are:

fcntl: This is specified in POSIX and should exist on every modern UNIX
platform. The main problem is that fcntl identifies locks by a (pid, inode)
pair. This means that you can only hold a single lock on a file in a process.
If you close a lock file, any locks obtained via other file descriptors will
be relased, too. This is especially dangerous in multi-threaded applications
and AFAICS can only be worked around with refcounted global locks protected by
a mutex.

lockf: Typically just a wrapper around fcntl.

flock: This is a BSD interface that also works on Linux. Can be safely used
with threads. Requires Linux 2.6.12 to work with NFS.

On Windows, exclusive (and mandatory) locks can be obtained when opening a
file with CreateFile, shared locks with FileLockEx.

Regarding NFS, you can find lots of information online about certain
implementations not supporting certain locking APIs. But it seems to me that
this is mostly anecdotal. My guess is that modern systems "just work" with
either fcntl or flock.

To get started, I'd propose to implement a locking backend based on flock that
is the default for local operation and optional on shared volumes.

Some other things I noted:

1. Lock_Is_Locked is in general prone to race conditions.

1.1. It is used to test the merge lock when indexing and pruning which is OK
because the write lock is held. But it makes it hard to reason about the code.

1.2. Testing NFS snapshot read locks with ShLock_Is_Locked is inherently racy.
If it turns out that shared locks based on flock or fcntl work reliably on
somewhat modern NFS setups, I'd love to remove Is_Locked and the current
SharedLock implementation completely.

2. The background merger first obtains the write lock, then the merge lock,
then releases the write lock and acquires it again. With multiple background
mergers and no timeouts, this could lead to a classic deadlock. It isn't a
problem in practice because typically there's only a single merger and we have
timeouts. But again, it makes the code hard to reason about.

Here's a sketch of how I'd rework the code. The key idea is to separate the
purging of snapshots and dead merges. Purging of dead merges is error recovery
and should be done before modifying the index. Purging snapshots is routine
cleanup that should be done only after modifying the index.

     Indexer() {
         if (Lock_Request(merge_lock)) {
             // No active background merger.
             Lock_Obtain(write_lock);
                 // Purge dead merge if merge.json exists.
             Lock_Release(write_lock);
             Lock_Release(merge_lock);
         }

         Lock_Obtain(write_lock);
             // Test for and read merge.json without checking
             // the merge lock.
             // Add documents.
             // Commit.
             // Purge snapshots (but not dead merge).
         Lock_Release(write_lock);
     }

     BackgroundMerger() {
         Lock_Obtain(merge_lock);
             Lock_Obtain(write_lock);
                 // Purge dead merge if merge.json exists.
                 // Set up.
                 // Write merge.json.
             Lock_Release(write_lock);
             // Merge.
             Lock_Obtain(write_lock);
                 // Commit.
                 // Remove merge.json.
                 // Purge snapshots (but not dead merge).
             Lock_Release(write_lock);
         Lock_Release(merge_lock);
     }

Benefits:

- No need for Is_Locked.
- Consistent locking order with outer merge lock and
   inner write lock.
- Don't purge twice when indexing and background merging.

Nick

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Improving Lucy's locking code

Marvin Humphrey
On Wed, Feb 15, 2017 at 5:42 AM, Nick Wellnhofer <[hidden email]> wrote:

> I had a close look at Lucy's locking code and while I didn't find any
> glaring problems, I think it could be improved in some places.

If you really feel like taking this on, go ahead -- but let me describe some
of the history and also suggest a slightly different way forward: consider
starting with DeletionPolicy.

Lucy's read locking is only needed on NFS, and only to prevent `Stale NFS
filehandle` exceptions.  If we introduce the ability to control when
FilePurger deletes files via a DeletionPolicy interface, then SharedLock can
go away entirely.

NFS users would be expected to supply their own DeletionPolicy such as one
delaying deletion of a snapshot and its associated files for N seconds after
it is superseded by a new snapshot -- enough time for any current search to
complete and to refresh the Searcher before the next search.

Once Lucy's locking mechanism is simplified to only handle write locking, the
prospect of reworking it with native locks gets slightly less scary.

(Another advantage of DeletionPolicy is that it can help to ensure that a
snapshot on a remote search box remains available -- a feature which is needed
by ClusterSearcher.)

> First of all,
> I'd like to use a locking mechanism provided by the OS instead of rolling
> our own. The main upside is that we can detect reliably whether a process
> holding a lock has crashed without relying on a heuristic. On UNIX there
> are:
>
> fcntl: This is specified in POSIX and should exist on every modern UNIX
> platform. The main problem is that fcntl identifies locks by a (pid, inode)
> pair. This means that you can only hold a single lock on a file in a
> process. If you close a lock file, any locks obtained via other file
> descriptors will be relased, too. This is especially dangerous in
> multi-threaded applications and AFAICS can only be worked around with
> refcounted global locks protected by a mutex.
>
> lockf: Typically just a wrapper around fcntl.
>
> flock: This is a BSD interface that also works on Linux. Can be safely used
> with threads. Requires Linux 2.6.12 to work with NFS.
>
> On Windows, exclusive (and mandatory) locks can be obtained when opening a
> file with CreateFile, shared locks with FileLockEx.

Native locking is a tangled mess of portability nightmares.  As you (and many
others) have noted, POSIX `fcntl` is a problematic API.  However, an
`flock`-based mechanism won't work on Solaris and its descendants -- if
`flock` is present at all, it is just a wrapper around `fcntl`.

    https://perkin.org.uk/posts/solaris-portability-flock.html

That means Lucy would need at least 3 different locking
implementations:

*   `flock` on Linux/BSD/Darwin/most-other-unix
*   `CreateFile`/`FileLockEx` on Windows
*   The current system (?) on Solaris/SmartOS/etc

The advantage of the current lockfile-based system is that it works pretty
much the same way everywhere except on old Windows systems that don't support
hard links (and Lucy doesn't work at all).  Because lock files are visible on
the file system and can outlive crashed processes, it's also easier to see
what it's doing.  In contrast, native locks are invisible and thus when they
fail under obscure action-at-a-distance circumstances like another thread
releasing an `fcntl` lock, or a forked process failing to inherit `fcntl`
locks from the parent, it's hard to track down what happened.

> Regarding NFS, you can find lots of information online about certain
> implementations not supporting certain locking APIs. But it seems to me that
> this is mostly anecdotal. My guess is that modern systems "just work" with
> either fcntl or flock.

In my career as a software developer, NFS competes with Internet Explorer 6 as
the biggest waster of my time, and it remains a fundamentally flawed design.
But it may be that NFS support for file locking has improved enough that it's
good enough for our purposes.  See the 2015 update to this 2000 article:

    https://www.time-travellers.org/shane/papers/NFS_considered_harmful.html

    For example, Carsten Grohmann contacted me recently and reports:

        Currently parts of the documentation are outdated like the locking
        section. We use NFS locks extensively on RHEL5 and RHEL7 together with
        NetApp and it works really well.

> Some other things I noted:
>
> 1. Lock_Is_Locked is in general prone to race conditions.

Of course.

> 1.1. It is used to test the merge lock when indexing and pruning which is OK
> because the write lock is held. But it makes it hard to reason about the
> code.

Indeed.  The merging logic is hairy -- especially the part about merging
updated deletions.

> 1.2. Testing NFS snapshot read locks with ShLock_Is_Locked is inherently
> racy. If it turns out that shared locks based on flock or fcntl work
> reliably on somewhat modern NFS setups, I'd love to remove Is_Locked and the
> current SharedLock implementation completely.

It would be great to ditch the current SharedLock entirely. :) I'd advocate
DeletionPolicy as more portable and reliable than reimplementing SharedLock
with native lock systems and their uncertain semantics.  Additionally, the
DeletionPolicy strategy does not require agreement between search and
indexing processes about what lock type is in use (e.g. lock files vs. native
locks).

> 2. The background merger first obtains the write lock, then the merge lock,
> then releases the write lock and acquires it again. With multiple background
> mergers and no timeouts, this could lead to a classic deadlock. It isn't a
> problem in practice because typically there's only a single merger and we
> have timeouts. But again, it makes the code hard to reason about.

True.

> Here's a sketch of how I'd rework the code. The key idea is to separate the
> purging of snapshots and dead merges. Purging of dead merges is error
> recovery and should be done before modifying the index.  Purging snapshots is
> routine cleanup that should be done only after modifying the index.

What you've described is the intent of what we do now: for both Indexer
and BackgroundMerger, we call FilePurger_Purge() once at the start to clean up
dead merges, then call FilePurger_Purge() again at the end to get rid of old
snapshots.  It is true that Purge() could be split, eliminating some
unnecessary (though currently safe) cleanup operations.

It's also worth mentioning that SegWriter_Prep_Seg_Dir(), which both Indexer
and BackgroundMerger run, nukes its target seg dir with Folder_Delete_Tree()
before starting fresh.  This cleans up after both failed indexing and
merging sessions[1].

>     Indexer() {
>         if (Lock_Request(merge_lock)) {
>             // No active background merger.
>             Lock_Obtain(write_lock);
>                 // Purge dead merge if merge.json exists.
>             Lock_Release(write_lock);
>             Lock_Release(merge_lock);
>         }
>
>         Lock_Obtain(write_lock);
>             // Test for and read merge.json without checking
>             // the merge lock.
>             // Add documents.
>             // Commit.
>             // Purge snapshots (but not dead merge).
>         Lock_Release(write_lock);
>     }

I believe this will work.  The key is that the write lock is held by
BackgroundMerger when merge.json is written and by Indexer when
merge.json is read.

>     BackgroundMerger() {
>         Lock_Obtain(merge_lock);
>             Lock_Obtain(write_lock);
>                 // Purge dead merge if merge.json exists.
>                 // Set up.
>                 // Write merge.json.
>             Lock_Release(write_lock);
>             // Merge.
>             Lock_Obtain(write_lock);
>                 // Commit.
>                 // Remove merge.json.
>                 // Purge snapshots (but not dead merge).
>             Lock_Release(write_lock);
>         Lock_Release(merge_lock);
>     }

This looks sane.

> Benefits:
>
> - No need for Is_Locked.
> - Consistent locking order with outer merge lock and
>   inner write lock.
> - Don't purge twice when indexing and background merging.

Sounds good, if you're willing. :)

Marvin Humphrey

[1] Nuking the target segment in SegWriter_Prep_Seg_Dir() also has a happy
    incidental effect of nearly always preventing index corruption when
    multiple Indexers from different machines overwrite each other on NFS.
    Having temp files wiped is generally going to cause finishing and
    consolidation operations called from SegWriter_Finish() to fail, so the
    first Indexer throws an exception before it gets a chance to commit a
    damaged segment.
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Improving Lucy's locking code

Peter Karman
In reply to this post by Nick Wellnhofer
Nick Wellnhofer wrote on 2/15/17 7:42 AM:
> Lucifers,
>
> I had a close look at Lucy's locking code and while I didn't find any glaring
> problems, I think it could be improved in some places. First of all, I'd like to
> use a locking mechanism provided by the OS instead of rolling our own. The main
> upside is that we can detect reliably whether a process holding a lock has
> crashed without relying on a heuristic.

Not to dissuade you from rewriting all the locking logic, since Marvin has given
his far more insightful accounting on that topic, but I wonder if there is a
simpler way to solve the problem you raise: detecting reliably whether a process
holding a lock has crashed.

https://metacpan.org/pod/Path::Class::File::Lockable is how I have approached
that in my Perl code, by writing the PID to the lock file, and then checking if
the PID is still alive when I encounter the lock later.

https://github.com/publicinsightnetwork/audience-insight-repository/blob/master/bin/indexer#L121 
e.g.

We already write the PID to Lucy lock files, so we can check if the process that
created the lock is still running, yes?

Or is that the very heuristic that you're wanting to move away from?

--
Peter Karman  .  https://peknet.com/  .  https://keybase.io/peterkarman
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Improving Lucy's locking code

Nick Wellnhofer
On 18/02/2017 03:48, Peter Karman wrote:
> We already write the PID to Lucy lock files, so we can check if the process
> that created the lock is still running, yes?
>
> Or is that the very heuristic that you're wanting to move away from?

Yes, because it's unreliable. We don't detect whether another unrelated
process happens to reuse the PID. For example:

- We have an Indexer with PID 42.
- The machine crashes during indexing, leaving a lockfile with PID 42.
- The machine restarts and happens to assign PID 42 to another process
   before an Indexer runs.
- Any new Indexer will be locked out as long as this other process is
   running.

Right now, the only remedy is to manually delete the lock file. Fortunately,
this scenario is unlikely if only the indexing process terminates abnormally,
because PIDs won't get reused until they wrap around. Even if there's a system
crash, there's a good chance that an Indexer is started before the old PID is
reused.

On a shared volume like NFS, this problem is more pronounced. A single machine
that goes down or loses its network connection in the wrong moment will block
all other indexers until it gets back up and starts an indexing session.

Native locks are released by the operating system if a process crashes. This
even works on NFS after a certain timeout with modern client and server
implementations. Other than that, native shared locks should be faster on NFS.

Nick

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Improving Lucy's locking code

Nick Wellnhofer
In reply to this post by Marvin Humphrey
On 18/02/2017 01:37, Marvin Humphrey wrote:
> Lucy's read locking is only needed on NFS, and only to prevent `Stale NFS
> filehandle` exceptions.

I thought so, too, but then I realized that BackgroundMerger also protects the
snapshot it's working on with a read lock.

> That means Lucy would need at least 3 different locking
> implementations:
>
> *   `flock` on Linux/BSD/Darwin/most-other-unix
> *   `CreateFile`/`FileLockEx` on Windows
> *   The current system (?) on Solaris/SmartOS/etc

Yes, but Windows/UNIX could share the same Lock subclass.

> In my career as a software developer, NFS competes with Internet Explorer 6 as
> the biggest waster of my time,

;)

> Sounds good, if you're willing. :)

Here's my work-in-progress branch on Github:

     https://github.com/nwellnhof/lucy/commits/locky-mclockface

I realized that we have to align the Lock class with native lock semantics
first. So I merged SharedLock into LockFileLock, providing Request_Shared and
Request_Exclusive methods (note that Github collapses the crucial
core/Lucy/Store/Lock.c diff):

 
https://github.com/nwellnhof/lucy/commit/f390bc51cb8964606084287ccc8e9258d7a5e96b

This allows to remove the global deletion lock:

 
https://github.com/nwellnhof/lucy/commit/42ca7d3002b69134f261b8c555cb99a3cc17cce3

The changes to the merge lock logic are minor (after the initial FilePurger
change):

 
https://github.com/nwellnhof/lucy/commit/323e4472a629d5fd64e4678d4921ade1d521d1d9

I also found a couple of issues with the existing code, but nothing too serious.

It should be easy to implement native locks on top of these changes. The
hardest part is probably to make sure that all users of an index agree on a
single locking mechanism.

Nick


Reply | Threaded
Open this post in threaded view
|

Re: [lucy-dev] Improving Lucy's locking code

Peter Karman
In reply to this post by Nick Wellnhofer
Nick Wellnhofer wrote on 2/18/17 9:22 AM:
> On 18/02/2017 03:48, Peter Karman wrote:
>> We already write the PID to Lucy lock files, so we can check if the process
>> that created the lock is still running, yes?
>>
>> Or is that the very heuristic that you're wanting to move away from?
>
> Yes, because it's unreliable. We don't detect whether another unrelated process
> happens to reuse the PID. For example:

+1


--
Peter Karman  .  https://peknet.com/  .  https://keybase.io/peterkarman