LocatedFileStatusFetcher.getFileStatuses failing intermittently with s3
FYI, HADOOP-16458 : LocatedFileStatusFetcher.getFileStatuses failing
intermittently with s3
This is inevitably something up with S3A, but I'm going to be making
changes to the LocatedFileStatusFetcher code as well as o.a.h.fs.Globber to
help diagnose this, so it's stepping into MAPREDUCE land.
-there are no explicit unit tests of LocatedFileStatusFetcher doing scans
of object stores or filesystems. Is there anything I've not seen?
- the FileSystem globber has code which, if it does a listStatus(path) gets
a single entry, calls getFileStatus to get some more information, which the
docs say "needed to handle symlinks"
I don't know where we are with symlinks right now, because they aren't in
any object store, and disabled for HDFS.
What do people think if I actually removed that secondary check?
I may play with some subclassing games and just remove it for S3A, so it's
lower risk, while improving perf slightly. ABFS could copy.