Skip to content

Commit

Permalink
ovsdb-idl: Support write-only-changed IDL monitor mode.
Browse files Browse the repository at this point in the history
At a first glance, change tracking should never be allowed for
write-only columns.  However, some clients (e.g., ovn-northd) that are
mostly exclusive writers of a database, use change tracking to avoid
duplicating the IDL row records into a local cache when implementing
incremental processing.

The default behavior of the IDL is to automatically turn a write-only
column into a read-write column whenever the client enables change
tracking for that column.

For the afore mentioned clients, this becomes a performance issue.
Commit 1cc618c ("ovsdb-idl: Fix atomicity of writes that don't
change a column's value.") explains why writes that don't change a
column's value cannot be optimized out early if the column is
read/write.

Furthermore, if there is at least one record in any table that
changed during a transaction, then *all* records that have been
written are added to the transaction, even if their values didn't
change.  If there are many such rows (e.g., like in ovn-northd's
case) this incurs a significant overhead because:
a. the client has to build this large transaction
b. the transaction has to be sent over the network
c. the server needs to parse this (mostly) no-op update

We now introduce new IDL APIs allowing users to set a new monitoring
mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, to indicate to the IDL that the
atomicity constraints may be relaxed and written columns that don't
change value can be skipped from the current transaction.

We benchmarked ovn-northd performance when using this new mode
against NB and SB databases taken from ovn-kubernetes scale tests.
We noticed that when a minor change is performed to the Northbound
database (e.g., NB_Global.nb_cfg is incremented) the time it takes to
build the Southbound transaction becomes negligible (vs ~1.5 seconds
before this change).

End-to-end ovn-kubernetes scale tests on 120-node clusters also show
significant reduction of latency to bring up pods; both average and P99
latency decreased by ~30%.

Acked-by: Han Zhou <[email protected]>
Signed-off-by: Dumitru Ceara <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
  • Loading branch information
dceara authored and igsilya committed Apr 28, 2022
1 parent b7aaf41 commit d94cd0d
Show file tree
Hide file tree
Showing 5 changed files with 115 additions and 14 deletions.
4 changes: 4 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ Post-v2.17.0
- OVSDB:
* 'relay' service model now supports transaction history, i.e. honors the
'last-txn-id' field in 'monitor_cond_since' requests from clients.
- OVSDB-IDL:
* New monitor mode flag, OVSDB_IDL_WRITE_CHANGED_ONLY, allowing
applications to relax atomicity requirements when dealing with
columns whose value has been rewritten (but not changed).


v2.17.0 - 17 Feb 2022
Expand Down
60 changes: 54 additions & 6 deletions lib/ovsdb-idl.c
Original file line number Diff line number Diff line change
Expand Up @@ -1396,6 +1396,45 @@ ovsdb_idl_track_clear(struct ovsdb_idl *idl)
{
ovsdb_idl_track_clear__(idl, false);
}

/* Sets or clears (depending on 'enable') OVSDB_IDL_WRITE_CHANGED_ONLY
* for 'column' in 'idl', ensuring that the column will be included in a
* transaction only if its value has actually changed locally. Normally
* read/write columns that are written to are always included in the
* transaction but, in specific cases, when the application doesn't
* require atomicity of writes across different columns, the ones that
* don't change value may be skipped.
*
* This function should be called between ovsdb_idl_create() and
* the first call to ovsdb_idl_run().
*/
void
ovsdb_idl_set_write_changed_only(struct ovsdb_idl *idl,
const struct ovsdb_idl_column *column,
bool enable)
{
if (enable) {
*ovsdb_idl_get_mode(idl, column) |= OVSDB_IDL_WRITE_CHANGED_ONLY;
} else {
*ovsdb_idl_get_mode(idl, column) &= ~OVSDB_IDL_WRITE_CHANGED_ONLY;
}
}

/* Helper function to wrap calling ovsdb_idl_set_write_changed_only() for
* all columns that are part of 'idl'.
*/
void
ovsdb_idl_set_write_changed_only_all(struct ovsdb_idl *idl, bool enable)
{
for (size_t i = 0; i < idl->class_->n_tables; i++) {
const struct ovsdb_idl_table_class *tc = &idl->class_->tables[i];

for (size_t j = 0; j < tc->n_columns; j++) {
const struct ovsdb_idl_column *column = &tc->columns[j];
ovsdb_idl_set_write_changed_only(idl, column, enable);
}
}
}

static void
log_parse_update_error(struct ovsdb_error *error)
Expand Down Expand Up @@ -3491,6 +3530,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
{
struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_);
const struct ovsdb_idl_table_class *class;
unsigned char column_mode;
bool optimize_rewritten;
size_t column_idx;
bool write_only;

Expand All @@ -3501,12 +3542,15 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,

class = row->table->class_;
column_idx = column - class->columns;
write_only = row->table->modes[column_idx] == OVSDB_IDL_MONITOR;
column_mode = row->table->modes[column_idx];
write_only = column_mode == OVSDB_IDL_MONITOR;
optimize_rewritten =
write_only || (column_mode & OVSDB_IDL_WRITE_CHANGED_ONLY);


ovs_assert(row->new_datum != NULL);
ovs_assert(column_idx < class->n_columns);
ovs_assert(row->old_datum == NULL ||
row->table->modes[column_idx] & OVSDB_IDL_MONITOR);
ovs_assert(row->old_datum == NULL || column_mode & OVSDB_IDL_MONITOR);

if (row->table->idl->verify_write_only && !write_only) {
VLOG_ERR("Bug: Attempt to write to a read/write column (%s:%s) when"
Expand All @@ -3524,9 +3568,13 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
* different value in that column since we read it. (But if a whole
* transaction only does writes of existing values, without making any real
* changes, we will drop the whole transaction later in
* ovsdb_idl_txn_commit().) */
if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
datum, &column->type)) {
* ovsdb_idl_txn_commit().)
*
* The application may choose to bypass this restriction and always
* optimize by setting OVSDB_IDL_WRITE_CHANGED_ONLY.
*/
if (optimize_rewritten && ovsdb_datum_equals(ovsdb_idl_read(row, column),
datum, &column->type)) {
goto discard_datum;
}

Expand Down
16 changes: 14 additions & 2 deletions lib/ovsdb-idl.h
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ bool ovsdb_idl_server_has_column(const struct ovsdb_idl *,
* IDL will change the value returned by ovsdb_idl_get_seqno() when the
* column's value changes in any row.
*
* The possible mode combinations are:
* Typical mode combinations are:
*
* - 0, for a column that a client doesn't care about. This is the default
* for every column in every table, if the client passes false for
Expand All @@ -181,10 +181,17 @@ bool ovsdb_idl_server_has_column(const struct ovsdb_idl *,
* - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_TRACK), for a column
* that a client wants to track using the change tracking
* ovsdb_idl_track_get_*() functions.
*
* - (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT | OVSDB_IDL_WRITE_CHANGED_ONLY)
* is similar to (OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) except that it
* only adds a written column to a transaction if the column's value
* has actually changed.
*/
#define OVSDB_IDL_MONITOR (1 << 0) /* Replicate this column? */
#define OVSDB_IDL_ALERT (1 << 1) /* Alert client when column changes? */
#define OVSDB_IDL_TRACK (1 << 2)
#define OVSDB_IDL_TRACK (1 << 2) /* Track column changes? */
#define OVSDB_IDL_WRITE_CHANGED_ONLY \
(1 << 3) /* Write back only changed columns? */

void ovsdb_idl_add_column(struct ovsdb_idl *, const struct ovsdb_idl_column *);
void ovsdb_idl_add_table(struct ovsdb_idl *,
Expand Down Expand Up @@ -233,6 +240,11 @@ bool ovsdb_idl_track_is_updated(const struct ovsdb_idl_row *row,
const struct ovsdb_idl_column *column);
void ovsdb_idl_track_clear(struct ovsdb_idl *);

void ovsdb_idl_set_write_changed_only(struct ovsdb_idl *idl,
const struct ovsdb_idl_column *column,
bool enable);
void ovsdb_idl_set_write_changed_only_all(struct ovsdb_idl *idl, bool enable);


/* Reading the database replica. */

Expand Down
31 changes: 30 additions & 1 deletion tests/ovsdb-idl.at
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,20 @@ m4_define([OVSDB_CHECK_IDL_C],
OVSDB_SERVER_SHUTDOWN
AT_CLEANUP])

# same as OVSDB_CHECK_IDL but uses OVSDB_IDL_WRITE_CHANGED_ONLY.
m4_define([OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C],
[AT_SETUP([$1 - write-changed-only - C])
AT_KEYWORDS([ovsdb server idl positive $5])
AT_CHECK([ovsdb_start_idltest])
m4_if([$2], [], [],
[AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])])
AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 idl unix:socket $3],
[0], [stdout], [ignore])
AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
[0], [$4])
OVSDB_SERVER_SHUTDOWN
AT_CLEANUP])

# same as OVSDB_CHECK_IDL but uses tcp.
m4_define([OVSDB_CHECK_IDL_TCP_C],
[AT_SETUP([$1 - C - tcp])
Expand Down Expand Up @@ -257,6 +271,7 @@ m4_define([OVSDB_CHECK_IDL_SSL_PY],

m4_define([OVSDB_CHECK_IDL],
[OVSDB_CHECK_IDL_C($@)
OVSDB_CHECK_IDL_WRITE_CHANGED_ONLY_C($@)
OVSDB_CHECK_IDL_TCP_C($@)
OVSDB_CHECK_IDL_TCP6_C($@)
OVSDB_CHECK_IDL_PY($@)
Expand Down Expand Up @@ -1166,8 +1181,22 @@ m4_define([OVSDB_CHECK_IDL_TRACK_C],
OVSDB_SERVER_SHUTDOWN
AT_CLEANUP])

m4_define([OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C],
[AT_SETUP([$1 - write-changed-only - C])
AT_KEYWORDS([ovsdb server idl tracking positive $5])
AT_CHECK([ovsdb_start_idltest])
m4_if([$2], [], [],
[AT_CHECK([ovsdb-client transact unix:socket $2], [0], [ignore], [ignore])])
AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 -c -w idl unix:socket $3],
[0], [stdout], [ignore])
AT_CHECK([sort stdout | uuidfilt]m4_if([$6],,, [[| $6]]),
[0], [$4])
OVSDB_SERVER_SHUTDOWN
AT_CLEANUP])

m4_define([OVSDB_CHECK_IDL_TRACK],
[OVSDB_CHECK_IDL_TRACK_C($@)])
[OVSDB_CHECK_IDL_TRACK_C($@)
OVSDB_CHECK_IDL_TRACK_WRITE_CHANGED_ONLY_C($@)])

OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated],
[['["idltest",
Expand Down
18 changes: 13 additions & 5 deletions tests/test-ovsdb.c
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@
VLOG_DEFINE_THIS_MODULE(test_ovsdb);

struct test_ovsdb_pvt_context {
bool write_changed_only;
bool track;
};

Expand Down Expand Up @@ -91,6 +92,7 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt)
{"timeout", required_argument, NULL, 't'},
{"verbose", optional_argument, NULL, 'v'},
{"change-track", optional_argument, NULL, 'c'},
{"write-changed-only", optional_argument, NULL, 'w'},
{"magic", required_argument, NULL, OPT_MAGIC},
{"no-rename-open-files", no_argument, NULL, OPT_NO_RENAME_OPEN_FILES},
{"help", no_argument, NULL, 'h'},
Expand Down Expand Up @@ -125,6 +127,10 @@ parse_options(int argc, char *argv[], struct test_ovsdb_pvt_context *pvt)
pvt->track = true;
break;

case 'w':
pvt->write_changed_only = true;
break;

case OPT_MAGIC:
magic = optarg;
break;
Expand Down Expand Up @@ -2610,6 +2616,7 @@ update_conditions(struct ovsdb_idl *idl, char *commands)
static void
do_idl(struct ovs_cmdl_context *ctx)
{
struct test_ovsdb_pvt_context *pvt = ctx->pvt;
struct jsonrpc *rpc;
struct ovsdb_idl *idl;
unsigned int seqno = 0;
Expand All @@ -2618,9 +2625,6 @@ do_idl(struct ovs_cmdl_context *ctx)
int step = 0;
int error;
int i;
bool track;

track = ((struct test_ovsdb_pvt_context *)(ctx->pvt))->track;

idl = ovsdb_idl_create(ctx->argv[1], &idltest_idl_class, true, true);
ovsdb_idl_set_leader_only(idl, false);
Expand All @@ -2637,10 +2641,14 @@ do_idl(struct ovs_cmdl_context *ctx)
rpc = NULL;
}

if (track) {
if (pvt->track) {
ovsdb_idl_track_add_all(idl);
}

if (pvt->write_changed_only) {
ovsdb_idl_set_write_changed_only_all(idl, true);
}

setvbuf(stdout, NULL, _IONBF, 0);

symtab = ovsdb_symbol_table_create();
Expand Down Expand Up @@ -2683,7 +2691,7 @@ do_idl(struct ovs_cmdl_context *ctx)
}

/* Print update. */
if (track) {
if (pvt->track) {
print_idl_track(idl, step++, terse);
ovsdb_idl_track_clear(idl);
} else {
Expand Down

0 comments on commit d94cd0d

Please sign in to comment.