Skip to content

Commit

Permalink
KUDU-2218. tls_socket: properly handle temporary socket errors in Writev
Browse files Browse the repository at this point in the history
This fixes a bug which caused RaftConsensusITest.TestLargeBatches to
fail when run under stress, as in the following command line:

taskset -c 0-4 \
 build/latest/bin/raft_consensus-itest \
   --gtest_filter=\*LargeBat\* \
   --stress-cpu-threads=8

This would produce an error like:
Network error: failed to write to TLS socket: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:878

This means that we were retrying a write after getting EAGAIN, but with
a different buffer than the first time.

I tracked this down to mishandling of temporary socket errors in
TlsSocket::Writev(). In the case that we successfully write part of the
io vector but hit such an error trying to write a later element in the
vector, we were still propagating the error back up to the caller. The
caller didn't realize that part of the write was successful, and thus it
would retry the write from the beginning.

The fix is to fix the above, but also to enable partial writes in
TlsContext. The new test fails if either of the above two changes are
backed out.

Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Reviewed-on: http://gerrit.cloudera.org:8080/8570
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <[email protected]>
Reviewed-on: http://gerrit.cloudera.org:8080/9361
Reviewed-by: Michael Ho <[email protected]>
Tested-by: Impala Public Jenkins
  • Loading branch information
toddlipcon authored and Impala Public Jenkins committed Feb 21, 2018
1 parent baec8ca commit 678bf28
Show file tree
Hide file tree
Showing 3 changed files with 219 additions and 70 deletions.
2 changes: 1 addition & 1 deletion be/src/kudu/security/tls_context.cc
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ Status TlsContext::Init() {
if (!ctx_) {
return Status::RuntimeError("failed to create TLS context", GetOpenSSLErrors());
}
SSL_CTX_set_mode(ctx_.get(), SSL_MODE_AUTO_RETRY);
SSL_CTX_set_mode(ctx_.get(), SSL_MODE_AUTO_RETRY | SSL_MODE_ENABLE_PARTIAL_WRITE);

// Disable SSLv2 and SSLv3 which are vulnerable to various issues such as POODLE.
// We support versions back to TLSv1.0 since OpenSSL on RHEL 6.4 and earlier does not
Expand Down
10 changes: 8 additions & 2 deletions be/src/kudu/security/tls_socket.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include "kudu/security/cert.h"
#include "kudu/security/openssl_util.h"
#include "kudu/util/errno.h"
#include "kudu/util/net/socket.h"

namespace kudu {
namespace security {
Expand All @@ -42,11 +43,11 @@ Status TlsSocket::Write(const uint8_t *buf, int32_t amt, int32_t *nwritten) {
CHECK(ssl_);
SCOPED_OPENSSL_NO_PENDING_ERRORS;

*nwritten = 0;
if (PREDICT_FALSE(amt == 0)) {
// Writing an empty buffer is a no-op. This happens occasionally, eg in the
// case where the response has an empty sidecar. We have to special case
// it, because SSL_write can return '0' to indicate certain types of errors.
*nwritten = 0;
return Status::OK();
}

Expand All @@ -61,7 +62,6 @@ Status TlsSocket::Write(const uint8_t *buf, int32_t amt, int32_t *nwritten) {
ErrnoToString(save_errno), save_errno);
}
// Socket not ready to write yet.
*nwritten = 0;
return Status::OK();
}
return Status::NetworkError("failed to write to TLS socket",
Expand Down Expand Up @@ -90,6 +90,12 @@ Status TlsSocket::Writev(const struct ::iovec *iov, int iov_len, int32_t *nwritt
}
RETURN_NOT_OK(SetTcpCork(0));
*nwritten = total_written;
// If we did manage to write something, but not everything, due to a temporary socket
// error, then we should still return an OK status indicating a successful _partial_
// write.
if (total_written > 0 && Socket::IsTemporarySocketError(write_status.posix_code())) {
return Status::OK();
}
return write_status;
}

Expand Down
Loading

0 comments on commit 678bf28

Please sign in to comment.