Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cherry-Pick] Fix bug in object_by_id_cache (#20450) #20613

Merged
merged 2 commits into from
Dec 12, 2024

Conversation

mystenmark
Copy link
Contributor

Suppose that a reader thread is trying to cache an object that it just
read, while a writer thread is trying to cache an object that it just
wrote. The writer thread definitionally has the latest version. The
reader thread may be out of date. While we previously took some care to
not replace a new version with an old version, this did not take into
account evictions, and so the following bug was possible:

  READER                            WRITER
  read object_by_id_cache (miss)
  read dirty set (miss)
                                    write to dirty
  read db (old version)

write to cache (while holding dirty lock)
cache entry is evicted
write to cache

There is no way for the reader to tell that the value it is caching is
out of date, because the up to date entry is already gone from the
cache.

We fix this by requiring reader threads to obtain a ticket before they
read from the dirty set and/or db. Tickets are expired by writers. Then,
the above case looks like this:

  READER                            WRITER
  get ticket
  read cache (miss)
  read dirty (miss)
                                    write dirty
  read db (old version)
                                    expire ticket

write cache (while holding dirty lock)
cache eviction
no write to cache (ticket expired)

Any interleaving of the above either results in the reader seeing a
recent version, or else having an expired ticket.

Description

Describe the changes or additions included in this PR.

Test plan

How did you test the new or updated feature?


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

Suppose that a reader thread is trying to cache an object that it just
read, while a writer thread is trying to cache an object that it just
wrote. The writer thread definitionally has the latest version. The
reader thread may be out of date. While we previously took some care to
not replace a new version with an old version, this did not take into
account evictions, and so the following bug was possible:

      READER                            WRITER
      read object_by_id_cache (miss)
      read dirty set (miss)
                                        write to dirty
      read db (old version)
write to cache (while holding dirty lock)
                                        cache entry is evicted
      write to cache

There is no way for the reader to tell that the value it is caching is
out of date, because the up to date entry is already gone from the
cache.

We fix this by requiring reader threads to obtain a ticket before they
read from the dirty set and/or db. Tickets are expired by writers. Then,
the above case looks like this:

      READER                            WRITER
      get ticket
      read cache (miss)
      read dirty (miss)
                                        write dirty
      read db (old version)
                                        expire ticket
write cache (while holding dirty lock)
                                        cache eviction
      no write to cache (ticket expired)

Any interleaving of the above either results in the reader seeing a
recent version, or else having an expired ticket.
Copy link

vercel bot commented Dec 12, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 12, 2024 10:23pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Dec 12, 2024 10:23pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Dec 12, 2024 10:23pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Dec 12, 2024 10:23pm

@ebmifa
Copy link
Contributor

ebmifa commented Dec 12, 2024

@mystenmark, do we need this for v1.38.0, which is already in mainnet? v1.39.0 is getting out to mainnet next week.

@mystenmark mystenmark enabled auto-merge (squash) December 12, 2024 22:10
@ebmifa ebmifa temporarily deployed to sui-typescript-aws-kms-test-env December 12, 2024 22:21 — with GitHub Actions Inactive
@mystenmark mystenmark merged commit 44cbc21 into releases/sui-v1.38.0-release Dec 12, 2024
48 of 52 checks passed
@mystenmark mystenmark deleted the mlogan-cp-cache-fix branch December 12, 2024 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants