Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
tablet: acquire/release locks in batches
Prior to this patch, each row operation required a separate call into the LockManager to acquire or release its corresponding row lock. This required atomic operations on the LockManager, each of which would bounce the cacheline containing the spinlock(s) between the prepare thread (acquiring locks) and the apply threads (releasing locks). Atomic operations on cachelines in remote cores are quite expensive, so this caused quite a lot of CPU usage (locking related methods took more CPU than the actual work on prepare/apply threads) This patch changes the LockManager API somewhat to allow acquiring/releasing the locks in bulk. WriteOp now contains a single ScopedRowLock object to hold all of the required locks, instead of a separate ScopedRowLock per row. The acquire/release paths are also optimized now to get better instruction level parallelism in the hashtable lookups -- prefetches are used to bring the appropriate cache lines into CPU cache prior to reading them. This patch also simplifies the locking to use a single lockmanager-wide lock instead of per-bucket locks. Per-bucket locks would still be possible but don't seem to be beneficial, and would complicate the codepath quite a bit. Removing them also saves some significant memory from the LockManager hash tables. Benchmarked by running: $ perf stat ./build/release/bin/kudu tserver run -fs-wal-dir /tmp/ts \ -enable_maintenance_manager=0 -unlock-unsafe-flags and $ kudu perf loadgen localhost -num_rows_per_thread=10000000 -num_threads=8 This workload ends up somewhat client-bound on my machine, so wallclock time isn't improved much, but the tserver CPU consumption is reduced by about 23%: Before: Performance counter stats for './build/release/bin/kudu tserver run -fs-wal-dir /tmp/ts -enable_maintenance_manager=0 -unlock-unsafe-flags': 177786.37 msec task-clock # 7.648 CPUs utilized 215350 context-switches # 0.001 M/sec 55653 cpu-migrations # 0.313 K/sec 3219417 page-faults # 0.018 M/sec 724369004287 cycles # 4.074 GHz (83.32%) 142799406201 stalled-cycles-frontend # 19.71% frontend cycles idle (83.36%) 109903159567 stalled-cycles-backend # 15.17% backend cycles idle (83.40%) 719706215568 instructions # 0.99 insn per cycle # 0.20 stalled cycles per insn (83.30%) 131445739053 branches # 739.347 M/sec (83.31%) 479779584 branch-misses # 0.37% of all branches (83.32%) 165.187526000 seconds user 12.866637000 seconds sys After: Performance counter stats for './build/release/bin/kudu tserver run -fs-wal-dir /tmp/ts -enable_maintenance_manager=0 -unlock-unsafe-flags': 145986.70 msec task-clock # 6.063 CPUs utilized 202600 context-switches # 0.001 M/sec 51442 cpu-migrations # 0.352 K/sec 3368490 page-faults # 0.023 M/sec 597915142173 cycles # 4.096 GHz (83.51%) 58333772996 stalled-cycles-frontend # 9.76% frontend cycles idle (83.35%) 104785221789 stalled-cycles-backend # 17.53% backend cycles idle (83.18%) 690655964982 instructions # 1.16 insn per cycle # 0.15 stalled cycles per insn (83.38%) 126988529873 branches # 869.864 M/sec (83.40%) 469031328 branch-misses # 0.37% of all branches (83.17%) 134.072172000 seconds user 12.192747000 seconds sys Change-Id: I3cb724e953ecdf188a35181c2f91b721b3416524 Reviewed-on: http://gerrit.cloudera.org:8080/16169 Tested-by: Todd Lipcon <[email protected]> Reviewed-by: Andrew Wong <[email protected]>
- Loading branch information