Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement naive clamping of the chunk cache size for mining worker #660

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

shizzard
Copy link
Collaborator

@shizzard shizzard commented Dec 2, 2024

No description provided.

h2_hashes = #{}
name = not_set :: atom(),
partition_number = not_set :: non_neg_integer() | not_set,
diff_pair = not_set, % :: ?DIFF_PAIR()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vird what is the diff pair here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pair of mining difficulties, the first element is the mining difficulty for one-chunk solutions, the second element - for two-chunk solutions.

State#state{
sub_chunk_cache_size = maps:put(SessionKey, CacheSize + Delta,
State#state.sub_chunk_cache_size) }.
NewCacheSize = ar_util:clamp_lower(CacheSize + Delta, 0),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key change: do not let the sub_chunk_cache_size go below zero, as this does not make any sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we fighting the symptom? Perhaps, let's leave a comment this is a temporary measure until we figure out why we subtract more than we add, if that is the case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, yes, you're right; proper fix will require the deep (sub)chunk cache refactoring, which is out of scope for this task.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shizzard what about adding a log_warning message whenever CacheSize+Delta is below zero? We do want to fix the OOMs as quick as possible and clamping may help there, but I'm a little worried we may also lose this valuable signal that something is wrong. Which may make tracking down the true root cause harder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slacked with some more information that might help with reproducing the negative cache size (and so might help get at the root cause)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about adding a log_warning message

Thats a very good idea, will do.

%% @doc Clamp a value to a lower and upper bound.
%% If the value is less than the lower bound, return the lower bound.
%% If the value is greater than the upper bound, return the upper bound.
clamp(Value, LowerBound, UpperBound) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: ar_util:between/3 also does this: https://github.com/ArweaveTeam/arweave/pull/660/files#diff-1f26621f4ea9ef47e071afce0f4ce9fc3e6bacc76089ac9037c00e0c07a5ce9eR46

Can you add a comment to ar_util:between that people should start using ar_util:clamp instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't see the existing one, okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants