Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(default): set tablets_initial_scale_factor scylla config option #10188

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vponomaryov
Copy link
Contributor

@vponomaryov vponomaryov commented Feb 24, 2025

Make it be bigger than the current scylla default '1' to avoid load balance issues.

Ref: scylladb/scylladb#22522

Testing

  • [ ]

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit tests fail

@@ -1,3 +1,6 @@
append_scylla_yaml:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add link to issue why we need that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. I think this means initial tablets will be 2*2^10 isn't it? (see configurations/tablets-initial-32.yaml where 4 means 32 tablets).
This approach also is not taking into account number of shards, but for the sake of quick workaround might be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized, this param changed its purpose... scylladb/scylladb@f043c83

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this change didn't yet reached 2025.1 branch.

@mykaul
Copy link
Contributor

mykaul commented Feb 24, 2025

@bhalevy - is that a reasonable default?

@tgrabiec
Copy link

@bhalevy - is that a reasonable default?

This is the value which we're about to use in scylla by default, so yes.

Make it be bigger than the current scylla default '1' to avoid
load balance issues.

Ref: scylladb/scylladb#22522
@vponomaryov vponomaryov force-pushed the tablets_initial_scale_factor branch from 7abaf37 to 3058daf Compare February 24, 2025 16:28
@vponomaryov vponomaryov requested a review from soyacz February 24, 2025 16:29
Copy link
Contributor

@pehala pehala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect way of achieving this, tablets_initial_scale_factor should not be used by users, it will also undergo a change with scylladb/scylladb#22522. Initial option on keyspace should be used instead:

CREATE KEYSPACE excalibur
WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'replication_factor': 3,
} AND tablets = {
    'initial': 2048
};

@vponomaryov
Copy link
Contributor Author

vponomaryov commented Feb 24, 2025

This is incorrect way of achieving this, tablets_initial_scale_factor should not be used by users, it will also undergo a change with scylladb/scylladb#22522. Initial option on keyspace should be used instead:

CREATE KEYSPACE excalibur
WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'replication_factor': 3,
} AND tablets = {
    'initial': 2048
};

Users? It is used for setting up DB clusters.

Then, updating any table definition we create in tests is not applicable,
In this case one should update enormous number of places and not only in SCT.

So, the solution must be only DB cluster config update, not per-ks/table.

@pehala
Copy link
Contributor

pehala commented Feb 24, 2025

Users? It is used for setting up DB clusters.

ScyllaDB users should not use this option (according to developers), so we should not use it as well.

Then, updating any table definition we create in tests is not applicable, In this case one should update enormous number of places and not only in SCT.

Other than SCT nothing is performance bound, missing this setting causes performance problems only. It is true that updating this properly in SCT would be a big change as we cannot do it as a default though.

So, the solution must be only DB cluster config update, not per-ks/table.

It is per-ks/table now and it will be in the future (Although the initial will get replaced by other options and hopefully will work reasonably well out of the box).

If this is meant as a hotfix sure, we can do this like it temporarily, but this cannot stay as a permanent solution for this problem.

@pehala
Copy link
Contributor

pehala commented Feb 24, 2025

Also good to mention, the new solution is targeted pretty soon, for 2025.1.1. So it might make more sense to do this hotfix only for 2025.1.0

@vponomaryov
Copy link
Contributor Author

Users? It is used for setting up DB clusters.

ScyllaDB users should not use this option (according to developers), so we should not use it as well.

I think we understand different things under the "users" word.
We update scylla config in lots of places, because we need to test various things.
If you disagree to update scylla config and no one going to update all (really all) the keyspace definition places then what else we have?

Then, updating any table definition we create in tests is not applicable, In this case one should update enormous number of places and not only in SCT.

Other than SCT nothing is performance bound, missing this setting causes performance problems only. It is true that updating this properly in SCT would be a big change as we cannot do it as a default though.

The https://github.com/scylladb/scylla-qa-internal is other and it stores lots of schema and query configs used in our CI testing.
I filed several different bugs using longevities, not performance jobs.
Those longevities were using schema defined in that repo.

So, the solution must be only DB cluster config update, not per-ks/table.

It is per-ks/table now and it will be in the future (Although the initial will get replaced by other options and hopefully will work reasonably well out of the box).

If this is meant as a hotfix sure, we can do this like it temporarily, but this cannot stay as a permanent solution for this problem.

This is not hotfix, it is workaround for the Scylla bug.
And yes, it is needed at first, for testing 2025.1 rc, at second - any tablets testing.

@pehala
Copy link
Contributor

pehala commented Feb 24, 2025

Users? It is used for setting up DB clusters.

ScyllaDB users should not use this option (according to developers), so we should not use it as well.

I think we understand different things under the "users" word. We update scylla config in lots of places, because we need to test various things. If you disagree to update scylla config and no one going to update all (really all) the keyspace definition places then what else we have?

I agree to update scylla config temporarily but I do not think it is right solution permanently. SCT is the closest thing we have to a real production clusters, we should use options that we recommend customers to use. Recommended way is NOT to set this config but use initial tablets option, thus we should use that. I understand it is not currently feasible, so we can use this method for now, but we need to change it, once the per-table hints + tablet scale factor patches drop and do this properly.

So, the solution must be only DB cluster config update, not per-ks/table.

It is per-ks/table now and it will be in the future (Although the initial will get replaced by other options and hopefully will work reasonably well out of the box).
If this is meant as a hotfix sure, we can do this like it temporarily, but this cannot stay as a permanent solution for this problem.

This is not hotfix, it is workaround for the Scylla bug. And yes, it is needed at first, for testing 2025.1 rc, at second - any tablets testing.

This is required for 2025.1-rc as workaround, but it cannot be used for any tablets testing. If we find any performance issues with this option enabled they will be throwned out, due to invalid configuration. Only results with initial tablets are valid.

TLDR: This is a bad way of configuring this, but we dont have a choice with current SCT setup. We need to treat it like a temporary workaround that will be removed as soon as we can and we definitely cannot use this for any performance measurements.

@@ -1,3 +1,9 @@
# NOTE: set the 'tablets_initial_scale_factor' explicitly while the following
# Scylla change is not merged and backported to older scylla branches.
# https://github.com/scylladb/scylladb/pull/22522
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This will never not get backported to older branches, but once this drops, we should remove this either way

@roydahan
Copy link
Contributor

@pehala I was under the impression that we should not change this parameter and use only the CQL option (after your discussion with @bhalevy).
Now we transitioned back to use this default after running almost all tabelts tests?

@pehala
Copy link
Contributor

pehala commented Feb 25, 2025

@pehala I was under the impression that we should not change this parameter and use only the CQL option (after your discussion with @bhalevy). Now we transitioned back to use this default after running almost all tabelts tests?

That is true, this is meant as workaround for failing tests (scylladb/scylladb#22522), since we cannot do globally CQL option, this is another thing. I would still prefer if we did CQL change only for tests that fail and not do this globally, but if this can unblock 2025.1 testing, I can accept it temporarily, but like I said in the previous comment, this global change has consequences on validity of our performance tests

@fruch
Copy link
Contributor

fruch commented Feb 25, 2025

Also good to mention, the new solution is targeted pretty soon, for 2025.1.1. So it might make more sense to do this hotfix only for 2025.1.0

we can't easily do like that in configuration, that would need to be in code.

I would suggest doing this one only to 2025.1 branch (not master), and revert it once we start 2025.1.1 testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants