Tags: clementperon/nsync
Tags
Update nsync Thread Sanitizer (TSan) annotations. I had previously added some TSan dynamic annotations to nsync to avoid TSan race warnings. However, depending on whether gcc or clang was used, and depending on the version of clang used, TSan race warnings could be reported on the nsync tests. When TSan is in used, this change adds a call to AnnotateRWLockAcquired just after each nsync mu acquistion, and a call AnnotateRWLockReleased just before each release. Additionally, AnnotateRWLockCreate is called at each mu creation. (nsync mutex destruction is inferred when memory is recycled.) Macros are used so that these additional calls are not introduced when TSan is not in use. All tests now pass without warnings in TSan mode in both clang and gcc, including in all the versions that previously reported warnings. In addition, the version number is incremented, and there are a few minor whitespace changes to make the TSan annotations conform the the project's whitespace conventions.
In nsync, avoid an ABSL depepdency for TSAN annotations. The code used to use ABSL macros for TSAN annotations that suppress warnings. This was a problem because: - ABSL is changing the names of the macros(!), and the team seems intent on forcing clients to make the incompatible change, even if the users of those clients have not yet switched to a suifficiently recent version of ABSL. And they are doing this despite their (apparently false) claim of a 5-year API stability guarantee. - The code had hard-wired a header file path that was wrong for most users, effectively preventing them from running with TSAN. - The conditional compilation to detect TSAN worked only for clang, and not for gcc, even though gcc supports TSAN. This change drops the ABSL dependency so there's no header file to include; it uses conditional compilation that detects TSAN on moth gcc and clang; and it uses the gcc/clang annotation routines directly. The version number is also increased.
Suppress compiler warnings on newer compilers that have additional ch… …ecks. GCC 8.0 in C++ mode introduced a warning if memset() is used to zero a struct, even if it POD. This causes the compiler to be chatty if nsync is build using g++. Most of the changes are to cast the pointer argument to memset() to "void *" to avoid the warning. GCC 8.0 in C++ mode also started warning if a C-style cast changed the "constness" of the data. I'd accidentally done this in testing/testing.c; now fixed. NetBSD has started using C99-style designated initializers in the standard pthread initializer constants even if using C89; clang 8.0 warns about this. This change turns the warning off in the netbsd/clang Makefiles.
If possible, avoid C++11 thread_local for TSD, even in C++11 builds. And when it is unavaoidable, attempt to handle the possibility of C++11's per_thread_waiter destructor being called multiple times on the same variable. Previously, if a C++11 build of nsync was requested, we would use C++11 thread_local to achieve the effects of pthread_key_create()'s destructor argument; that is, to invoke cleanup when a thread exits. Unfortunately, a) C++11 specifies that the destructors of static thread_local variables for the thread calling exit() are called before other static destructors are called. See https://en.cppreference.com/w/cpp/utility/program/exit [Strangely, it doesn't destruct all variables, which makes you wonder why it bothers to destruct any, given that destructing any is dangerous.] b) nsync can be pulled into an address space multiple times via multiple shared libraries into which nsync has been linked. Even (a) alone is dangerous: imagine that a programmer uses a thread_local variable from another static variable's destructor. When coupled with (b), various implementation behaviours become conceivable. For example, it's possible that an implementation could cause the same thread_local variable to be desructed repeatedly, once by each copy of the code pulled into the address space. This change does several things. The first two are related directly to the issue described above. The remainder were found during testing. 1) Uses pthread_key_create/pthread_getspecific/pthread_setspecific whenever possible, to avoid the use of thread_local. That's the version in nsync/platform/posix/src/per_thread_waiter.c. On Windows, we use the code that simulates pthread_key_create etc. 2) If the C++11 version is used, it will try to defend itself against the destructor being called twice, assuming that's a potential issue. That's the version in nsync/platform/c++11/src/per_thread_waiter.cc 3) It fixes a bug in platform/win32/src/pthread_key_win32.cc. Previously the code could try to run a null destructor. This could never happen in practice, because the library never sets a null destructor, but it's good to fix. 4) It casts the pointer passed to Windows' InterlockedCompareExchange(). Different compilers on Windows have different ideas about type compatibility with uint32_t. See tensorflow/tensorflow#31301
Introduce an implementation of nsync's internal semaphore that uses MacOS Grand Central Dispatch primitives. For documentation of that primitive, see: https://developer.apple.com/documentation/dispatch/1452955-dispatch_semaphore_create https://developer.apple.com/documentation/dispatch/1453087-dispatch_semaphore_wait https://developer.apple.com/documentation/dispatch/1452919-dispatch_semaphore_signal This seamaphore is faster for wakeups on the Mac than the implementation using the pthread API. For example, for the ping_pong benchmark (which just does waits and wakeups back and forth between threads) for mu_wait and cv_wait: Benchmark pingpong_test ops time ops/sec time/op ---------------------------------------------------------------------------------------- pthread implementation: benchmark_ping_pong_mu 359375 3.66s 9.8e+04 10.17us benchmark_ping_pong_mu_cv 359375 3.83s 9.4e+04 10.66us gcd_implementation: benchmark_ping_pong_mu 359375 2.54s 1.4e+05 7.08us benchmark_ping_pong_mu_cv 359375 2.51s 1.4e+05 6.98us Create a Makefile that uses the new semaphore implementation on the Mac, for testing. In cv_test.c and mu_wait_test.c, modify the way values are converted between pointers and integers for the producer/consumer performance test to avoid warning on some compilers about pointer arithmetic on the nil pointer.
Patch 1 (google#7) * Update BUILD Our team at Arm Inc. tested Tensorflow that has a dependency on nsync. We found that the cpu type has to be "aarch64" instead of "arm" for correct compilation on aarch64 servers. * Update bazel_BUILD Our team at Arm Inc. tested Tensorflow that has a dependency on nsync. We found that the cpu type has to be "aarch64" instead of "arm" for correct compilation on aarch64 servers.
PreviousNext