LocatedFileStatusFetcher.getFileStatuses failing intermittently with s3

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

LocatedFileStatusFetcher.getFileStatuses failing intermittently with s3

Steve Loughran-4
FYI, HADOOP-16458 : LocatedFileStatusFetcher.getFileStatuses failing
intermittently with s3

This is inevitably something up with S3A, but I'm going to be making
changes to the LocatedFileStatusFetcher code as well as o.a.h.fs.Globber to
help diagnose this, so it's stepping into MAPREDUCE land.

Two questions.

-there are no explicit unit tests of LocatedFileStatusFetcher doing scans
of object stores or filesystems. Is there anything I've not seen?
- the FileSystem globber has code which, if it does a listStatus(path) gets
a single entry, calls getFileStatus to get some more information, which the
docs say "needed to handle symlinks"

I don't know where we are with symlinks right now, because they aren't in
any object store, and disabled for HDFS.

What do people think if I actually removed that secondary check?

I may play with some subclassing games and just remove it for S3A, so it's
lower risk, while improving perf slightly. ABFS could copy.

Any thoughts?