feature(ArtifactsTest): IOTune parameters validation #10121

k0machi · 2025-02-18T16:46:05Z

This commits adds a new subtest inside Artifacts test, that does the
check of machine image io_properties.yaml by comparing them to the
actual machine values and showing the deviation. The results are both
printed to console with deviation value and submitted to argus as a
GenericResultTable.

Fixes scylladb/qa-tasks#1787

Testing

PR pre-checks (self review)

I added the relevant backport labels
I didn't leave commented-out/debugging code

Reminders

Add New configuration option and document them (in sdcm/sct_config.py)
Add unit tests to cover my changes (under unit-test/ folder)
Update the Readme/doc folder relevant to this change (if needed)

k0machi · 2025-02-18T16:46:48Z

Demo:

17:42:50  < t:2025-02-18 16:42:49,047 f:iotune.py       l:55   c:sdcm.utils.validators.iotune p:INFO  > Disk performance values validation - testing /var/lib/scylla
17:42:50  < t:2025-02-18 16:42:49,047 f:iotune.py       l:59   c:sdcm.utils.validators.iotune p:INFO  > [/var/lib/scylla] read_iops: 109519 (-0)
17:42:50  < t:2025-02-18 16:42:49,047 f:iotune.py       l:59   c:sdcm.utils.validators.iotune p:INFO  > [/var/lib/scylla] read_bandwidth: 806913408 (6)
17:42:50  < t:2025-02-18 16:42:49,047 f:iotune.py       l:59   c:sdcm.utils.validators.iotune p:INFO  > [/var/lib/scylla] write_iops: 59110 (-3)
17:42:50  < t:2025-02-18 16:42:49,047 f:iotune.py       l:59   c:sdcm.utils.validators.iotune p:INFO  > [/var/lib/scylla] write_bandwidth: 559601600 (-0)
17:42:50  < t:2025-02-18 16:42:50,119 f:artifacts_test.py l:280  c:ArtifactsTest        p:INFO  > Verify XFS mount options for /var/lib/scylla contain `discard'

soyacz · 2025-02-19T10:53:44Z

artifacts_test.py

@@ -325,6 +326,14 @@ def test_scylla_service(self):
            with self.subTest("check ENA support"):
                assert self.node.ena_support, "ENA support is not enabled"

+        if ("gce" in backend or "aws" in backend or "azure" in backend):


is it going to work with e.g. k8s-local-kind-aws?
I'd propose to use if backend in ("gce", "aws", "azure")
Also, shouldn't it be used only with predefined images (if params.get('use_preinstalled_scylla')?

Good catch, I was testing and removed use_preinstalled_scylla, will put it back

soyacz · 2025-02-19T11:03:09Z

sdcm/utils/validators/iotune.py

+                diff = (val / preset_val - 1) * 100
+                LOGGER.info("[%s] %s: %s (%.0f)", mountpoint, key, val, diff)
+
+    def _submit_results_to_argus(self):


this should be done outside of IOTuneValidator which should do only one thing: validating correctness of predefined io_properties against actual values: all things like submitting or printing in console could be done separately to not blur purpose of this class.

okay, I'll move submission to the tester body.

Moved to argus_results, kept the console output in the validator

soyacz · 2025-02-19T11:12:51Z

sdcm/utils/validators/iotune.py

+                ValidationRules = {
+                    "read_iops": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))),
+                    "read_bandwidth": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))),
+                    "write_iops": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))),
+                    "write_bandwidth": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))),
+                }


I thing best_pct rule is not right.
I'd propose different approach: instead of sending raw metrics, send differences in absolute value. This way higher_is_better could be set to False and fixed limit (calculated by predefined io_properties.yaml metric * x%). This should work for any instance size.
In that case table name should be renamed to f"{self.params.get('cluster_backend')} - {self.node.db_node_instance_type} IO Tune absolute deviation"
Drawback is only when we need to know exact value we would need to look into logs (could be tackled by publishing error event with full details in that case)

We can send both I think, and that's what I was thinking as well - I'll look into doing it that way.

Done, I think. We might want to sent the baseline as well.

soyacz · 2025-02-19T11:17:06Z

sdcm/utils/validators/iotune.py

+        if tested_mountpoint != preset_disk["mountpoint"]:
+            LOGGER.warning("Disks differ - probably a mistake: %s vs %s, will not submit iotune results",
+                           tested_mountpoint, preset_disk["mountpoint"])
+            return


I think we shouldn't fail silently here. Better raise error event

This commits adds a new subtest inside Artifacts test, that does the check of machine image io_properties.yaml by comparing them to the actual machine values and showing the deviation Fixes scylladb/qa-tasks#1787

soyacz

send_iotune_results_to_argus function does validation and other unrelated to sending results to argus operations.
These should be part of validator and validator should prepare data for this function (this one should just send with minimum logic).

soyacz · 2025-02-25T07:56:49Z

sdcm/argus_results.py

+    if not preset_disk:
+        LOGGER.warning("Unable to continue - node should have io_properties.yaml, but it doesn't.")
+        TestFrameworkEvent(source="send_iotune_results_to_argus",
+                           message="Unable to continue - node should have io_properties.yaml, but it doesn't.",
+                           severity=Severity.ERROR).publish()
+        return


this don't belong here. Possibly could be moved to IOTuneValidator._read_io_properties

soyacz · 2025-02-25T07:58:00Z

sdcm/argus_results.py

+                ColumnMetadata(name="write_bandwidth", unit="bps", type=ResultType.INTEGER, higher_is_better=True),
+            ]
+
+            ValidationRules = {


soyacz · 2025-02-25T08:00:16Z

sdcm/argus_results.py

+    class IOPropertiesResultsTable(GenericResultTable):
+        class Meta:
+            name = f"{params.get('cluster_backend')} - {node.db_node_instance_type} Disk Performance"
+            description = "io_properties.yaml comparison with live data"


it's no longer comparison rather pure data from io_properties (row name should define 'image preset' or 'measured')

k0machi self-assigned this Feb 18, 2025

k0machi requested review from fruch and soyacz February 18, 2025 16:46

k0machi added backport/2024.2 Need backport to 2024.2 backport/6.2 backport/2025.1 labels Feb 18, 2025

soyacz reviewed Feb 19, 2025

View reviewed changes

k0machi force-pushed the artifacts-iotune-params-verification branch 4 times, most recently from 4e35601 to 9d35c02 Compare February 25, 2025 03:22

k0machi requested a review from soyacz February 25, 2025 03:22

k0machi force-pushed the artifacts-iotune-params-verification branch 4 times, most recently from c9cd843 to 677fede Compare February 25, 2025 03:39

feature(ArtifactsTest): IOTune parameters validation

079bbc7

This commits adds a new subtest inside Artifacts test, that does the check of machine image io_properties.yaml by comparing them to the actual machine values and showing the deviation Fixes scylladb/qa-tasks#1787

k0machi force-pushed the artifacts-iotune-params-verification branch from 677fede to 079bbc7 Compare February 25, 2025 03:39

soyacz requested changes Feb 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(ArtifactsTest): IOTune parameters validation #10121

feature(ArtifactsTest): IOTune parameters validation #10121

k0machi commented Feb 18, 2025

k0machi commented Feb 18, 2025

soyacz Feb 19, 2025

k0machi Feb 19, 2025

k0machi Feb 25, 2025

soyacz Feb 19, 2025

k0machi Feb 19, 2025

k0machi Feb 25, 2025

soyacz Feb 19, 2025

k0machi Feb 19, 2025

k0machi Feb 25, 2025

soyacz Feb 19, 2025

k0machi Feb 19, 2025

k0machi Feb 25, 2025

soyacz left a comment

soyacz Feb 25, 2025

soyacz Feb 25, 2025

soyacz Feb 25, 2025

feature(ArtifactsTest): IOTune parameters validation #10121

Are you sure you want to change the base?

feature(ArtifactsTest): IOTune parameters validation #10121

Conversation

k0machi commented Feb 18, 2025

Testing

PR pre-checks (self review)

Reminders

k0machi commented Feb 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

soyacz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment