-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(ArtifactsTest): IOTune parameters validation #10121
base: master
Are you sure you want to change the base?
feature(ArtifactsTest): IOTune parameters validation #10121
Conversation
|
artifacts_test.py
Outdated
@@ -325,6 +326,14 @@ def test_scylla_service(self): | |||
with self.subTest("check ENA support"): | |||
assert self.node.ena_support, "ENA support is not enabled" | |||
|
|||
if ("gce" in backend or "aws" in backend or "azure" in backend): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it going to work with e.g. k8s-local-kind-aws
?
I'd propose to use if backend in ("gce", "aws", "azure")
Also, shouldn't it be used only with predefined images (if params.get('use_preinstalled_scylla')
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I was testing and removed use_preinstalled_scylla
, will put it back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sdcm/utils/validators/iotune.py
Outdated
diff = (val / preset_val - 1) * 100 | ||
LOGGER.info("[%s] %s: %s (%.0f)", mountpoint, key, val, diff) | ||
|
||
def _submit_results_to_argus(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be done outside of IOTuneValidator
which should do only one thing: validating correctness of predefined io_properties against actual values: all things like submitting or printing in console could be done separately to not blur purpose of this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I'll move submission to the tester body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to argus_results, kept the console output in the validator
sdcm/utils/validators/iotune.py
Outdated
ValidationRules = { | ||
"read_iops": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))), | ||
"read_bandwidth": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))), | ||
"write_iops": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))), | ||
"write_bandwidth": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thing best_pct
rule is not right.
I'd propose different approach: instead of sending raw metrics, send differences in absolute value. This way higher_is_better
could be set to False
and fixed limit (calculated by predefined io_properties.yaml metric * x%). This should work for any instance size.
In that case table name should be renamed to f"{self.params.get('cluster_backend')} - {self.node.db_node_instance_type} IO Tune absolute deviation"
Drawback is only when we need to know exact value we would need to look into logs (could be tackled by publishing error event with full details in that case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can send both I think, and that's what I was thinking as well - I'll look into doing it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I think. We might want to sent the baseline as well.
sdcm/utils/validators/iotune.py
Outdated
if tested_mountpoint != preset_disk["mountpoint"]: | ||
LOGGER.warning("Disks differ - probably a mistake: %s vs %s, will not submit iotune results", | ||
tested_mountpoint, preset_disk["mountpoint"]) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't fail silently here. Better raise error event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
4e35601
to
9d35c02
Compare
c9cd843
to
677fede
Compare
This commits adds a new subtest inside Artifacts test, that does the check of machine image io_properties.yaml by comparing them to the actual machine values and showing the deviation Fixes scylladb/qa-tasks#1787
677fede
to
079bbc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
send_iotune_results_to_argus
function does validation and other unrelated to sending results to argus operations.
These should be part of validator and validator should prepare data for this function (this one should just send with minimum logic).
if not preset_disk: | ||
LOGGER.warning("Unable to continue - node should have io_properties.yaml, but it doesn't.") | ||
TestFrameworkEvent(source="send_iotune_results_to_argus", | ||
message="Unable to continue - node should have io_properties.yaml, but it doesn't.", | ||
severity=Severity.ERROR).publish() | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this don't belong here. Possibly could be moved to IOTuneValidator._read_io_properties
ColumnMetadata(name="write_bandwidth", unit="bps", type=ResultType.INTEGER, higher_is_better=True), | ||
] | ||
|
||
ValidationRules = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant
class IOPropertiesResultsTable(GenericResultTable): | ||
class Meta: | ||
name = f"{params.get('cluster_backend')} - {node.db_node_instance_type} Disk Performance" | ||
description = "io_properties.yaml comparison with live data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's no longer comparison rather pure data from io_properties (row name should define 'image preset' or 'measured')
This commits adds a new subtest inside Artifacts test, that does the
check of machine image io_properties.yaml by comparing them to the
actual machine values and showing the deviation. The results are both
printed to console with deviation value and submitted to argus as a
GenericResultTable.
Fixes scylladb/qa-tasks#1787
Testing
PR pre-checks (self review)
backport
labelsReminders
sdcm/sct_config.py
)unit-test/
folder)