Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Global stack map. #2628

Open
wants to merge 4 commits into
base: staging
Choose a base branch
from
Open

[Refactor] Global stack map. #2628

wants to merge 4 commits into from

Conversation

d0cd
Copy link
Collaborator

@d0cd d0cd commented Mar 4, 2025

This PR removes external_stacks from Stack and references all stacks through the global stack map in Process.
This significantly reduces bookkeeping needed to keep Process consistent as stacks are modified or evicted from memory.
The changes made in this PR should be semantically equivalent to the prior design.
While the weak reference from Stack to Process may have some overhead, it should not introduce memory leaks.

An example

Suppose we have the following programs:

  • one.aleo
  • two.aleo, which imports one.aleo
  • three.aleo, which imports one.aleo and two.aleo
  • four.aleo, which imports one.aleo, two.aleo and three.aleo

The existing Process and Stacks are organized in this way in memory:
image

The new design is organized in this way:
image

The blue arrows are Arcs and the red arrows are Weaks.

@d0cd d0cd mentioned this pull request Mar 4, 2025
2 tasks
@d0cd d0cd requested review from ljedrz, raychu86, vicsn and niklaslong March 4, 2025 23:14
Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few comments, but I'm in favor of these changes in general. Is there some specific benchmark where the memory impact could be measured? I'd be happy to run the numbers.

@d0cd d0cd closed this Mar 5, 2025
@d0cd d0cd force-pushed the feat/global-stack-map branch from 7acf383 to 6734bc4 Compare March 5, 2025 23:56
@d0cd d0cd reopened this Mar 6, 2025
@d0cd d0cd changed the base branch from feat/update-deployment-id-map to staging March 6, 2025 00:01
@d0cd d0cd force-pushed the feat/global-stack-map branch from 7acf383 to 2c10b1d Compare March 6, 2025 00:01
@d0cd d0cd requested review from ljedrz and vicsn March 6, 2025 00:11
@d0cd
Copy link
Collaborator Author

d0cd commented Mar 6, 2025

@ljedrz I have a branch here where I test out building a Process on all of the program on mainnet and testnet. It could be helpful in sniffing out performance issues and memory leaks. Note that it currently points to a rev for this PR

Comment on lines +213 to +225
self.stacks.read().contains_key(program_id)
}

/// Returns the stack for the given program ID.
#[inline]
pub fn get_stack(&self, program_id: impl TryInto<ProgramID<N>>) -> Result<&Arc<Stack<N>>> {
pub fn get_stack(&self, program_id: impl TryInto<ProgramID<N>>) -> Result<Arc<Stack<N>>> {
// Prepare the program ID.
let program_id = program_id.try_into().map_err(|_| anyhow!("Invalid program ID"))?;
// Acquire the read lock.
let stacks = self.stacks.read();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance that the reads happen concurrently using e.g. rayon?

@kpandl found that my recent attempt at a stacks cache resulted in a deadlock - the cause still needs to be investigated but one potential reason could be ProvableHQ/snarkOS@970386f

@kpandl would be great if you can run a stress test with the same setup against this PR, you can let it run for half a day, and if there is no halt, increase the request pressure and we try again for half a day.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a broader question on what is the expected concurrency model of Process.

I am assuming that there is a main thread which has R/W privilege and peripheral threads with R privilege.
One concern could be that the main thread runs into a read-write deadlock, but we've kept lock scopes minimal to avoid that. Perhaps the assumptions are wrong.

Curious to see the results of the stress test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify what I think happened in ProvableHQ/snarkOS@970386f in order: 1. rayon spawns a number of threads which want to acquire the read-lock 2. the main thread wants to acquire the write-lock 3. another aforementioned rayon thread wants to acquire a new read-lock. However, rayon will only end all threads and release all read-locks atomically, so this new read-lock can't be acquired nor released, nor can the previous write-lock.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's helpful context! I believe we are using parking_lot::RwLock which should prevent writer starvation. Perhaps there's a subtle way this is getting triggered.

@d0cd d0cd requested review from ljedrz and vicsn March 6, 2025 15:14
Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👌

@d0cd d0cd force-pushed the feat/global-stack-map branch from d45aeed to 919bee6 Compare March 11, 2025 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants