Fix TC ref counting of Commit Ack markers in LQH so as not
to leak markers at LQH.
TC has one TC-Commit-Ack-Marker record per transaction which
is used to track which nodes and LDM instances hold
LQH-Commit-Ack-Marker records.
This is used when receiving TC_COMMIT_ACK to know which nodes
and LDM instances should be sent a REMOVE_MARKER_ORD signal.
TC only needs one operation per transaction to have
LQH-Commit-Ack-marker records (in each live node in one
nodegroup), so the approach taken is to request them
for all write operations until one of the write
operations succeeds (and keeps its marker at LQH).
After this, subsequent write operations needn't allocate
markers at LQH.
Write operations that don't succeed and don't immediately
cause a transaction abort (e.g. those defined with
IgnoreError, and which e.g. find no row, or row already exists
or something) are aborted (and discarded at LQH), and so they
leave no LQH-Commit-Ack marker.
Where a transaction prepares write operations that all fail at
LQH, there will be no LQH-Commit-Ack markers, and so no need
for a TC-Commit-Ack marker. This is handled using a reference
count of how many LQH-Commit-Ack markers have been requested
*or acknowledged*. If this becomes == 0 then there's no need
for a TC-Commit-Ack marker.
TC uses a per-transaction state and a per-transaction reference
counter to manage this.
The bug is that the reference count was only covering the
outstanding requests, and not the LQH-Commit-Ack markers that
were acknowledged. In other words the reference count was
decremented in execLQHKEYCONF, which signified that an LQH-Commit-Ack
marker was allocated on that LQH instance.
In certain situations this resulted in the allocated LQH-Commit-Ack
markers being leaked, and eventually this causes the cluster to become
read only as new write operations cannot allocate LQH-Commit-Ack markers.
Bug seems to have been added as part of
Bug #19451060 BUG#73339 IN MYSQL BUG SYSTEM, NDBREQUIRE INCORRECT
Fix is to *not* decrement the reference count in execLQHKEYCONF.
However, the current implementation 'forgets' that an operation resulted in
marker allocation (and reference count increment) after LQHKEYCONF is
processed.
To solve this, TC is modified to record which operations caused
LQH-Commit-Ack markers to be allocated, so that during the
per-operation phase of transaction ABORT or COMMIT, the
reference count can be decremented and so re-checked for
consistency.
Some additional jam()s and comments are added.
A new ndbinfo.ndb$pools pool is added - LQH Commit Ack Markers.
This is used in the testcase to ensure that all LQH Commit Ack
markers are released, and may be useful for problem diagnosis
in future.
Replication used in the test to get batching of writing operations
and NdbApi AO_IgnoreError flag setting.
Some basic transaction abort testcases are added which showed problems
with a partial fix.