Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race in refreshStats #236

Closed
SOF3 opened this issue Nov 8, 2022 · 6 comments · Fixed by #238
Closed

Data race in refreshStats #236

SOF3 opened this issue Nov 8, 2022 · 6 comments · Fixed by #238

Comments

@SOF3
Copy link

SOF3 commented Nov 8, 2022

reproduced with go build -race on latest master 675b8ad:

==================
WARNING: DATA RACE
Write at 0x00c006b6e138 by goroutine 45:
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).refreshStats()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:173 +0x135
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).vlogGcLoop()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:138 +0x22b
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).Start·dwrap·3()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:88 +0x39

Previous write at 0x00c006b6e138 by goroutine 240:
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).refreshStats()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:173 +0x135
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).gcLoop()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:101 +0x259
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).Start·dwrap·2()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:87 +0x39
@kskewes-sf
Copy link
Contributor

kskewes-sf commented Nov 23, 2022

Thanks for reporting.

  1. What version of Go are you using?
  2. Can you please provide the full command, e.g: go build -race ./pkg/sloop/main.go ?

go.mod currently has 1.13 but I think we should bump this to match target Kubernetes project. Something for a separate issue & PR.

Just tried building and running the tests in a loop with race checking enabled and without test caching, no errors. Maybe I was lucky though!

$ go version
go version go1.19.2 linux/amd64

$ for i in {1..100}; do go test -count 1 -race ./...; done
# <snip> all succeeded

$ for i in {1..100}; do go build -race ./...; done
# <snip> all succeeded

@SOF3
Copy link
Author

SOF3 commented Nov 24, 2022

$ go version
go 1.17 linux/amd64

$ go build -race -o output/sloop ./pkg/sloop/

$ ./output/sloop
ERROR: logging before flag.Parse: I1124 05:26:45.519980  878290 config.go:263] Default config set
I1124 05:26:45.534223  878290 server.go:41] SloopConfig: ConfigFile: ""
PrivilegedAccess: true
apiServerHost: ""
badgerDetailLogEnabled: false
badgerDiscardRatio: 0.99
badgerEnableEventLogging: false
badgerKeepL0InMemory: true
badgerLevSizeMultiplier: 0
badgerLevelOneSize: 0
badgerMaxTableSize: 0
badgerNumLevelZeroTables: 0
badgerNumLevelZeroTablesStall: 0
badgerNumOfCompactors: 0
badgerSyncWrites: true
badgerUseLSMOnlyOptions: true
badgerVLogFileIOMapping: false
badgerVLogFileSize: 0
badgerVLogGCFreq: 60000000000
badgerVLogMaxEntries: 200000
badgerVLogTruncate: true
bindAddress: ""
cleanupFrequency: 1800000000000
context: ""
crdRefreshInterval: 300000000000
debugPlaybackFile: ""
debugRecordFile: ""
defaultKind: _all
defaultLookback: 1h
defaultNamespace: default
deletionBatchSize: 1000
disableKubeWatch: false
disableStoreManager: false
displayContext: ""
enableDeleteKeys: false
enableGranularMetrics: false
keepMinorNodeUpdates: false
kubeWatchResyncInterval: 1800000000000
leftBarLinks: null
maxDiskMb: 32768
maxLookBack: 1209600000000000
mockBadger: false
port: 8080
resourceLinks: null
restoreDatabaseFile: ""
storeRoot: ./data
threshold for GC: 0.8
watchCrds: true
webfilesPath: ./pkg/sloop/webserver/webfiles
I1124 05:26:45.537211  878290 kubeclient.go:28] Getting k8s context with user-defined config masterURL=, kubeContextPreference=.
I1124 05:26:45.593880  878290 kubeclient.go:44] Get k8s context with context=kind-kind
I1124 05:26:45.650086  878290 store.go:113] BadgerDB Options: {Dir:data/kind-kind ValueDir:data/kind-kind SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:1 ReadOnly:false Truncate:true Logger:<nil> Compression:0 EventLogging:true InMemory:false MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 KeepL0InMemory:true MaxCacheSize:0 MaxBfCacheSize:0 LoadBloomsOnOpen:true NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:200000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 ZSTDCompressionLevel:1 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 managedTxns:false maxBatchCount:0 maxBatchSize:0}
I1124 05:26:45.650482  878290 kubeclient.go:51] Creating k8sclient with user-defined config masterURL=, kubeContext=kind-kind.
I1124 05:26:45.694534  878290 kubeclient.go:57] Building k8sclient with context=kind-kind, masterURL=https://127.0.0.1:16861, configFile=[/home/chankyin/.kube/config].
I1124 05:26:45.718042  878290 kubeclient.go:74] Created k8sclient with above configurations
E1124 05:26:45.841205  878290 reflector.go:156] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:108: Failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource
I1124 05:26:45.880083  878290 kubewatcher.go:142] Found 0 CRD definitions
I1124 05:26:45.880311  878290 kubewatcher.go:153] Stopping 0 CRD Informers
==================
WARNING: DATA RACE
Write at 0x00c005959268 by goroutine 32:
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).refreshStats()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:173 +0x135
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).vlogGcLoop()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:138 +0x22b
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).Start·dwrap·3()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:88 +0x39
I1124 05:26:45.888373  878290 webserver.go:230] Listening on http://localhost:8080

Previous write at 0x00c005959268 by goroutine 252:
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).refreshStats()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:173 +0x135
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).gcLoop()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:101 +0x259
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).Start·dwrap·2()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:87 +0x39

Goroutine 32 (running) created at:
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).Start()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:88 +0xf8
  github.com/salesforce/sloop/pkg/sloop/server.RealMain()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/server/server.go:141 +0x143e
  main.main()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/main.go:29 +0x15d

Goroutine 252 (running) created at:
  github.com/salesforce/sloop/pkg/sloop/storemanager.(*StoreManager).Start()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/storemanager/storemanager.go:87 +0x90
  github.com/salesforce/sloop/pkg/sloop/server.RealMain()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/server/server.go:141 +0x143e
  main.main()
      /data00/home/chankyin/go/src/github.com/salesforce/sloop/pkg/sloop/main.go:29 +0x15d
==================
E1124 05:26:46.851326  878290 reflector.go:156] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:108: Failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource
^CI1124 05:26:47.359835  878290 webserver.go:244] Shutting down server...
I1124 05:26:47.361243  878290 webserver.go:248] WebServer closed
I1124 05:26:47.361306  878290 kubewatcher.go:364] Stopping kubeWatcher
I1124 05:26:47.361454  878290 processing.go:98] Waiting for outstanding processing to finish
I1124 05:26:47.361549  878290 storemanager.go:159] Starting store manager shutdown
I1124 05:26:47.361955  878290 storemanager.go:97] Store manager main loop exiting
I1124 05:26:47.362031  878290 storemanager.go:134] ValueLogGC loop exiting
I1124 05:26:47.362093  878290 server.go:185] RunWithConfig finished
I1124 05:26:47.362181  878290 store.go:120] Closing store
I1124 05:26:47.406780  878290 store.go:122] Finished closing store
I1124 05:26:47.411154  878290 main.go:34] Shutting down gracefully
Found 1 data race(s)

Exit code: 66

@SOF3
Copy link
Author

SOF3 commented Nov 24, 2022

go build -race alone doesn't generate data race warnings (since no user code actually executed); it only generates warnings when the program is run.

@kskewes-sf
Copy link
Contributor

kskewes-sf commented Nov 27, 2022

Thanks for the reproduction steps.

Have confirmed with an empty Kind cluster.
Using a cluster with data didn't seem to trigger the race.
Looks like this behaviour was expected during #94.
See: https://github.com/salesforce/sloop/pull/94/files#diff-64d89434ab8ac9efb02a3250f78318411f6bb5cd9bf69958895f17aee5127f6dR140-R142

refreshStats() could be protected by a Mutex to avoid the race.
I'm new to the codebase so will do a little more investigation before raising a PR (briefly validated locally).

@SOF3
Copy link
Author

SOF3 commented Nov 28, 2022

I guess a sync.Once would be good enough

@kskewes-sf
Copy link
Contributor

Yes that's lighter than a mutex but we would still have both goroutines potentially try upsert sm.stats and emitMetrics() at the same time. This might not matter in practice. TBC.

Another alternative to sync.Once would be to initialise sm.stats during NewStoreManager(...).
This doesn't prevent the upsert/emit metrics race above either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants