[Refactor] Global stack map. #2628

d0cd · 2025-03-04T22:42:53Z

This PR removes external_stacks from Stack and references all stacks through the global stack map in Process.
This significantly reduces bookkeeping needed to keep Process consistent as stacks are modified or evicted from memory.
The changes made in this PR should be semantically equivalent to the prior design.
While the weak reference from Stack to Process may have some overhead, it should not introduce memory leaks.

An example

Suppose we have the following programs:

one.aleo
two.aleo, which imports one.aleo
three.aleo, which imports one.aleo and two.aleo
four.aleo, which imports one.aleo, two.aleo and three.aleo

The existing Process and Stacks are organized in this way in memory:

The new design is organized in this way:

The blue arrows are Arcs and the red arrows are Weaks.

synthesizer/process/src/lib.rs

synthesizer/process/src/stack/mod.rs

ljedrz

I've left a few comments, but I'm in favor of these changes in general. Is there some specific benchmark where the memory impact could be measured? I'd be happy to run the numbers.

synthesizer/process/src/stack/mod.rs

d0cd · 2025-03-06T04:35:45Z

@ljedrz I have a branch here where I test out building a Process on all of the program on mainnet and testnet. It could be helpful in sniffing out performance issues and memory leaks. Note that it currently points to a rev for this PR

synthesizer/process/src/lib.rs

synthesizer/process/src/finalize.rs

vicsn · 2025-03-06T11:29:47Z

synthesizer/process/src/lib.rs

+        self.stacks.read().contains_key(program_id)
    }

    /// Returns the stack for the given program ID.
    #[inline]
-    pub fn get_stack(&self, program_id: impl TryInto<ProgramID<N>>) -> Result<&Arc<Stack<N>>> {
+    pub fn get_stack(&self, program_id: impl TryInto<ProgramID<N>>) -> Result<Arc<Stack<N>>> {
        // Prepare the program ID.
        let program_id = program_id.try_into().map_err(|_| anyhow!("Invalid program ID"))?;
+        // Acquire the read lock.
+        let stacks = self.stacks.read();


Is there any chance that the reads happen concurrently using e.g. rayon?

@kpandl found that my recent attempt at a stacks cache resulted in a deadlock - the cause still needs to be investigated but one potential reason could be ProvableHQ/snarkOS@970386f

@kpandl would be great if you can run a stress test with the same setup against this PR, you can let it run for half a day, and if there is no halt, increase the request pressure and we try again for half a day.

There's a broader question on what is the expected concurrency model of Process.

I am assuming that there is a main thread which has R/W privilege and peripheral threads with R privilege.
One concern could be that the main thread runs into a read-write deadlock, but we've kept lock scopes minimal to avoid that. Perhaps the assumptions are wrong.

Curious to see the results of the stress test.

Just to clarify what I think happened in ProvableHQ/snarkOS@970386f in order: 1. rayon spawns a number of threads which want to acquire the read-lock 2. the main thread wants to acquire the write-lock 3. another aforementioned rayon thread wants to acquire a new read-lock. However, rayon will only end all threads and release all read-locks atomically, so this new read-lock can't be acquired nor released, nor can the previous write-lock.

That's helpful context! I believe we are using parking_lot::RwLock which should prevent writer starvation. Perhaps there's a subtle way this is getting triggered.

synthesizer/process/src/lib.rs

ljedrz

LGTM 👌

d0cd mentioned this pull request Mar 4, 2025

[Track] Program Upgradability #2614

Draft

2 tasks

d0cd requested review from ljedrz, raychu86, vicsn and niklaslong March 4, 2025 23:14