You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a breakout, decomposition of issue "db.connect()/disconnect() issues: db.prop and db.syncStatus.prop values, delayed replication, and read your own writes anomaly" #226
The "Local changes reverted when a replication delay is present" in powersync-sqlite-core was easy to reproduce in v1.10.0 and became harder in v1.11.0 using disconnecting/connecting to trigger the bug.
At the current time, v.12.1, if one adds some network instability, e.g. an occasional network partition, in addition to disconnecting/connecting, it is possible to reliably trigger the bug.
Under these conditions, the bug presents in two ways:
a {key: value} is read that is older, was previously read in prior reads, in place of the current, last read
i.e. the {key: value} appears to be non monotonic to the client, go backwards in time
the client reads an empty, SELECT * returns [], table even though there have been successful prior reads
A fuzzing application has been created to reproduce this and other anomalies, and fuzz test PowerSync in general.
There may actually be multiple bugs exposed by disconnecting/connecting with a flakey network.
The GitHub Actions runs a matrix of 240 jobs and:
25 fail with non monotonic reads
16 fail with empty table reads
141 fail with divergent final reads
The test matrix has been tuned over time to elicit maximum failure.
This issue is designed for the non monotonic or empty read. There are cleaner ways to produce divergent reads, and they behave differently, so will be opening up an independent issue.
The first local Docker build takes a bit of time, mostly creating the Flutter/Dart build image.
The local script that loops until error will sometimes find an error on the first attempt, other times it will grind, fuzz, for a bit.
It's the nature of fuzzing.
I know this is a lot.
I hope that the fuzzing environment can be made easy to use, maybe even run as an action on commits, so please ask questions and for changes that would be helpful.
Thanks!
The text was updated successfully, but these errors were encountered:
I got the fuzzing scripts running on my machine. The first iteration failed with Divergent final reads!, but the actual cause appeared to be a timeout while downloading the data: waited for db.currentStatus.downloading == false: tried 31 times every 1000ms. I haven't investigated this further.
The second iteration did fail with "non monotonic reads". So at the very least I can reproduce your results, and starting investigating the issue.
The first iteration failed with Divergent final reads!, but the actual cause appeared to be a timeout while downloading the data: waited for db.currentStatus.downloading == false: tried 31 times every 1000ms
It's a bit arbitrary to say the test is conclusively complete when faults disrupt replication. After running the transactions at rate for time, the test:
waits for 3 seconds
then waits for UploadQueueStats.count == 0 on all clients
then waits for SyncStatus.downloading == false on all clients
then does final reads on each client for comparison.
It seems that once faults disrupt replication to the point it doesn't resume, SyncStatus.downloading stays true indefinitely. The test gives it 30s before calling it quits and doing final reads.
I'll work on a better/cleaner/clearer way to detect
that replication has stopped
no clients are being updated
try to identify the last replicated transaction for each client
Hi,
This is a breakout, decomposition of issue "db.connect()/disconnect() issues: db.prop and db.syncStatus.prop values, delayed replication, and read your own writes anomaly" #226
It's also a way to trigger "Local changes reverted when a replication delay is present" powersync-ja/powersync-sqlite-core#54
The "Local changes reverted when a replication delay is present" in
powersync-sqlite-core
was easy to reproduce in v1.10.0 and became harder in v1.11.0 using disconnecting/connecting to trigger the bug.At the current time, v.12.1, if one adds some network instability, e.g. an occasional network partition, in addition to disconnecting/connecting, it is possible to reliably trigger the bug.
Under these conditions, the bug presents in two ways:
{key: value}
is read that is older, was previously read in prior reads, in place of the current, last read{key: value}
appears to be non monotonic to the client, go backwards in timeSELECT *
returns[]
, table even though there have been successful prior readsA fuzzing application has been created to reproduce this and other anomalies, and fuzz test PowerSync in general.
Please see A Dart CLI fuzzer for PowerSync.
There's a GitHub Action matrix to fuzz-non-monotonic-reads.
And a script for reproducing locally Local Host - Helper Script to Loop Until Error
Here's a typical example of working through a failed
powersync_fuzz.log
output../powersync-fuzz-loop.sh ./powersync_fuzz --table mww --clients 5 --rate 10 --time 100 --disconnect --partition
We can see the disconnecting, connecting, and partitioning nemesis activity:
We can see the nemesis is triggering upload/download errors in the
SyncStatus
stream, the client treats them as ignorable:And now for the non-monotonic read error.
At the end of
powersync_fuzz.log
, theSEVERE
messages will show the cause.The easiest cause to debug is "expected because myPreviousRead".
Here is
clientNum: 3
reading backwards in time:We can see that client 3 read the correct, current, value many times previously:
Add observe the correct value originally being written, uploaded and read by client 1:
There may actually be multiple bugs exposed by disconnecting/connecting with a flakey network.
The GitHub Actions runs a matrix of 240 jobs and:
The test matrix has been tuned over time to elicit maximum failure.
This issue is designed for the non monotonic or empty read. There are cleaner ways to produce divergent reads, and they behave differently, so will be opening up an independent issue.
The first local Docker build takes a bit of time, mostly creating the Flutter/Dart build image.
The local script that loops until error will sometimes find an error on the first attempt, other times it will grind, fuzz, for a bit.
It's the nature of fuzzing.
I know this is a lot.
I hope that the fuzzing environment can be made easy to use, maybe even run as an action on commits, so please ask questions and for changes that would be helpful.
Thanks!
The text was updated successfully, but these errors were encountered: