Skip to content

Commit

Permalink
[SPARK-19812] YARN shuffle service fails to relocate recovery DB acro…
Browse files Browse the repository at this point in the history
…ss NFS directories

## What changes were proposed in this pull request?

Change from using java Files.move to use Hadoop filesystem operations to move the directories.  The java Files.move does not work when moving directories across NFS mounts and in fact also says that if the directory has entries you should do a recursive move. We are already using Hadoop filesystem here so just use the local filesystem from there as it handles this properly.

Note that the DB here is actually a directory of files and not just a single file, hence the change in the name of the local var.

## How was this patch tested?

Ran YarnShuffleServiceSuite unit tests.  Unfortunately couldn't easily add one here since involves NFS.
Ran manual tests to verify that the DB directories were properly moved across NFS mounted directories. Have been running this internally for weeks.

Author: Tom Graves <[email protected]>

Closes apache#17748 from tgravescs/SPARK-19812.
  • Loading branch information
tgravescs authored and Tom Graves committed Apr 26, 2017
1 parent 7a36525 commit 7fecf51
Showing 1 changed file with 13 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.ByteBuffer;
import java.nio.file.Files;
import java.util.List;
import java.util.Map;

Expand Down Expand Up @@ -340,17 +339,17 @@ protected Path getRecoveryPath(String fileName) {
* when it previously was not. If YARN NM recovery is enabled it uses that path, otherwise
* it will uses a YARN local dir.
*/
protected File initRecoveryDb(String dbFileName) {
protected File initRecoveryDb(String dbName) {
if (_recoveryPath != null) {
File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbFileName);
File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbName);
if (recoveryFile.exists()) {
return recoveryFile;
}
}
// db doesn't exist in recovery path go check local dirs for it
String[] localDirs = _conf.getTrimmedStrings("yarn.nodemanager.local-dirs");
for (String dir : localDirs) {
File f = new File(new Path(dir).toUri().getPath(), dbFileName);
File f = new File(new Path(dir).toUri().getPath(), dbName);
if (f.exists()) {
if (_recoveryPath == null) {
// If NM recovery is not enabled, we should specify the recovery path using NM local
Expand All @@ -363,25 +362,29 @@ protected File initRecoveryDb(String dbFileName) {
// make sure to move all DBs to the recovery path from the old NM local dirs.
// If another DB was initialized first just make sure all the DBs are in the same
// location.
File newLoc = new File(_recoveryPath.toUri().getPath(), dbFileName);
if (!newLoc.equals(f)) {
Path newLoc = new Path(_recoveryPath, dbName);
Path copyFrom = new Path(f.toURI());
if (!newLoc.equals(copyFrom)) {
logger.info("Moving " + copyFrom + " to: " + newLoc);
try {
Files.move(f.toPath(), newLoc.toPath());
// The move here needs to handle moving non-empty directories across NFS mounts
FileSystem fs = FileSystem.getLocal(_conf);
fs.rename(copyFrom, newLoc);
} catch (Exception e) {
// Fail to move recovery file to new path, just continue on with new DB location
logger.error("Failed to move recovery file {} to the path {}",
dbFileName, _recoveryPath.toString(), e);
dbName, _recoveryPath.toString(), e);
}
}
return newLoc;
return new File(newLoc.toUri().getPath());
}
}
}
if (_recoveryPath == null) {
_recoveryPath = new Path(localDirs[0]);
}

return new File(_recoveryPath.toUri().getPath(), dbFileName);
return new File(_recoveryPath.toUri().getPath(), dbName);
}

/**
Expand Down

0 comments on commit 7fecf51

Please sign in to comment.