Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from grpc:master #2

Open
wants to merge 1,412 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
1412 commits
Select commit Hold shift + click to select a range
13efb55
[benchmark] Remove redundant callback benchmarks (#37935)
drfloob Oct 16, 2024
9945066
[call-v3] Send flow control (#37868)
ctiller Oct 16, 2024
7e06934
[chaotic-good] Pushback on message writes until they reach serializat…
ctiller Oct 16, 2024
5c5bf14
[ConfigFetcher] Set HTTP2 error to NO_ERROR to do graceful GOAWAYs (#…
yashykt Oct 17, 2024
29d3e58
[XdsCoreE2ETest] Use wait_for_ready on RPCs to wait for xds enabled s…
yashykt Oct 17, 2024
3da4d62
[retry test] fix flake caused by callback ordering (#37944)
markdroth Oct 18, 2024
da4fb49
[pick_first_new] extend experiment expiration time (#37947)
markdroth Oct 18, 2024
61be7b0
[EventEngine] [reland] Migrate `chttp2_server` to use EE DNSResolver …
yijiem Oct 19, 2024
1b6601b
[C++] Fix Directory Walker on NetBSD. (#37700)
0-wiz-0 Oct 19, 2024
e8addfa
test (#37956)
ctiller Oct 19, 2024
36188db
Port https://github.com/grpc/grpc-go/pull/6686 to C++ (#37950)
rainwoodman Oct 21, 2024
faa9ed6
[alts] ALTS server handshaker should return early if there is no hand…
matthewstevenson88 Oct 21, 2024
39f4195
[alts] Downgrade log level when handshaker service recv_buffer is nul…
matthewstevenson88 Oct 21, 2024
316fa27
Temporarily skipping `TestChannelReady.channel_ready_blocked` test du…
sreenithi Oct 21, 2024
9110b0d
adding back logging threshold tests (#37843)
sourabhsinghs Oct 21, 2024
4ef9cfd
Automated rollback of commit e8addfac9edfb0a2c62d71d0bda919237e290c48.
ctiller Oct 21, 2024
3ef7b76
[alts] Update max concurrent streams to ALTS handshaker service to 10…
matthewstevenson88 Oct 21, 2024
c3e83b8
[Release] Bump core version in preparation for 1.68 Branch Cut (#37941)
gnossen Oct 21, 2024
4662017
[EE] Prevent crash when address can't be resolved (#37952)
eugeneo Oct 21, 2024
14f22c7
[EventEngine] Disable the backup poller if all EventEngine experiment…
drfloob Oct 21, 2024
eacb2f7
Changed Bazel/Workspace to use @com_google_protobuf//python/dist:syst…
veblush Oct 21, 2024
c5999db
[call-v3] Fix leak with cq-based server (#37972)
ctiller Oct 21, 2024
1178b2d
[retry e2e test] add log message about known flakiness (#37974)
markdroth Oct 21, 2024
d411b44
[EventEngine] Stop using the CallbackAlternativeCQ (#37933)
drfloob Oct 22, 2024
b28ec6d
updated default run_tests max version to py313 (#37945)
sreenithi Oct 22, 2024
d56c939
[Windows] Hack to prewarm cache (`GetAdaptersAddresses`) for core_end…
yijiem Oct 22, 2024
36b9ee9
[ruby] explicitly enumerate ruby tests, don't use rake to run them (#…
apolcyn Oct 23, 2024
a24c8cc
[Build] Dropped Bazel 6.x support (#37979)
veblush Oct 23, 2024
1b997c5
[test] Disable //test/core/end2end/... on Windows (#37983)
drfloob Oct 23, 2024
357bfa7
[Dep] Upgraded rule_cc to 0.0.12 (#37978)
veblush Oct 23, 2024
53c7d45
[cleanup] Rm blank line (#37988)
drfloob Oct 23, 2024
8b74961
[promises] Add a promise-based match operator (#37981)
ctiller Oct 24, 2024
aa4dd17
[test] Increase timeout for windows portability tests (#37989)
yashykt Oct 24, 2024
17f0e28
[bazel] `proto_root` was appended to `dir_out` earlier, so don't dupl…
asedeno Oct 24, 2024
2213447
Bump master version to 1.69.0-dev (#38000)
gnossen Oct 25, 2024
2525988
[release] add v1.67.0 release to interop client matrix (#37985)
apolcyn Oct 25, 2024
8aa71db
[xDS gcp_authn filter] remove upper bound for cache size (#38005)
markdroth Oct 25, 2024
06eda49
[bazel] fix includes to use proper paths from child packages
markdroth Oct 28, 2024
ff7d726
[Python Bazel] Use `PyInfo` provider and `py_*` rules from rules_pyth…
mering Oct 28, 2024
5c31076
[AuthContext] Introduce an ConnectionContext class to hold arbitrary …
Vignesh2208 Oct 28, 2024
88b5c9e
[TokenFetcherCredentials] fix backoff behavior (#38004)
markdroth Oct 28, 2024
f0b514c
[test] Fix bug that broke core end2end tests Windows (#37918)
drfloob Oct 28, 2024
2329b25
[ObjC] fix lock inversion in dns service resolver shutdown (#38010)
HannahShiSFB Oct 28, 2024
cdac698
Update rules_python to 0.35.0 (#37996)
mering Oct 29, 2024
7e5dc14
[EventEngine] Workaround for missing data bug on endpoint/socket shut…
drfloob Oct 29, 2024
0bd12ba
[EventEngine] API contract: Endpoint::Read will provide either an err…
drfloob Oct 29, 2024
a16441b
[XdsClient] Add missing authority to XdsClient metrics scope (#38009)
yashykt Oct 29, 2024
265c7be
[chttp2] Fix channelz address (#38022)
yashykt Oct 29, 2024
4b41df2
[EventEngine] Add stronger wording on Endpoint::Read contract (#38036)
drfloob Oct 30, 2024
84b4525
[XdsE2ETest] Fix MtlsWithAggregateCluster to gracefully switch (#38037)
yashykt Oct 30, 2024
14ac94d
[Build] Override MACOSX_DEPLOYMENT_TARGET for gRPC Python (#37997)
veblush Oct 30, 2024
55b1ae9
[build] replace grpc_proto_library rules with separate proto_library/…
markdroth Oct 30, 2024
175d099
[DNS resolver] Call address_sorting_init and address_sorting_shutdown…
Vignesh2208 Oct 30, 2024
c2d899f
[interop] Add grpc-java 1.66.0-1.68.1 to client_matrix.py (#38021)
ejona86 Oct 30, 2024
99f6ee2
Adjust latent_see to fill based on time since last flush
ctiller Oct 31, 2024
ca1b57c
Add latent see annotations to more of grpc
ctiller Oct 31, 2024
a966a6f
[sanity] fix it (#38043)
ctiller Oct 31, 2024
dbd9b1e
[Build] Upgraded gcc to 14 (#38041)
veblush Oct 31, 2024
226a24c
[PHP] remove PersistentChannelTest.testInitHelper (#38042)
HannahShiSFB Oct 31, 2024
60d7444
[TSI] Print cert verification status on handshake failure (#37207)
csapuntz Oct 31, 2024
ab90aaf
[Test] Remove unnecessary cmake 3.16 installation (#38048)
veblush Oct 31, 2024
dfdda9e
Prepare code for breaking change in Protobuf C++ API.
evalon32 Nov 1, 2024
30506ff
[Python Aio] Fix test_cancel_after_done_writing (#38051)
XuanWang-Amos Nov 1, 2024
a6682db
[CI] Updated RBE Windows Configuration to use Bazel 7.3.1 (#38006)
veblush Nov 1, 2024
2ce4c64
Fix core typos (#38028)
NathanBaulch Nov 1, 2024
3bb0ea4
[PH2][NewFile][ClassStructure] Add client and server class (#37840)
tanvi-jagtap Nov 4, 2024
7f664c6
Improve metadata redaction comment (#38033)
tanvi-jagtap Nov 4, 2024
cb57a04
[chaotic-good] Land a second copy as an experiment (#38026)
ctiller Nov 4, 2024
574b19e
[Build] Use -msse2 option only for 32-bit Intel (#38024)
veblush Nov 4, 2024
86a68b4
Add new EventEngine::Extension to allow transport to send and receive…
ananda1066 Nov 4, 2024
e37d384
Add new HTTP2 frame type SecurityFrame for security-related data. Als…
ananda1066 Nov 4, 2024
9049ce0
[ServerStreaming10Messages] Add tracers for test (#38039)
yashykt Nov 4, 2024
6c05780
[EventEngine] Migrate `chaotic_good_server` to use EE DNSResolver (#3…
yijiem Nov 4, 2024
2a9c752
[Dart] Rebuild grpc_interop_dart docker image from dart:stable (#38049)
kannanjgithub Nov 5, 2024
dd19ed3
[ssl] Downgrade SSL handshake failure log to INFO. (#38058)
matthewstevenson88 Nov 5, 2024
0c11076
[AuthContext] Embed the connection context inside auth context to all…
Vignesh2208 Nov 5, 2024
edb6c62
Remove no-op gRPC Pull Request Artifact builds
drfloob Nov 5, 2024
be627d4
Increase VM timeout for windows/grpc_portability_build_only
drfloob Nov 5, 2024
2eaaa9c
Automated rollback of commit 0c1107636fff68229a6832ead51eabeaabb316d4.
Nov 5, 2024
5011420
[proto] revert to old-style BUILD rules for channelz and reflection
markdroth Nov 6, 2024
c4682fe
[Build] Revert "[Build] Use -msse2 option only for 32-bit Intel (#380…
veblush Nov 6, 2024
d53dde7
[Fix Flake] Fix contextvar test issue (#38076)
XuanWang-Amos Nov 6, 2024
6a0377e
[chttp2] Fix comments and messages (#38071)
yashykt Nov 6, 2024
6a0c483
[CI] Upgraded clang to 19 for sanity tests (#38070)
veblush Nov 6, 2024
9765f16
Remove unused definition of ResolvedAddrToUnixPathIfPossible (#38016)
hferreiro Nov 6, 2024
63cb58d
Fix link about gRPC Server Reflection for Node.js (#37524)
y-yagi Nov 6, 2024
bda98db
Update 'Local security connector' experiment (#38045)
erm-g Nov 6, 2024
0e96f83
[http-proxy] Add a log message sampling HTTP proxy connect failures (…
ctiller Nov 6, 2024
f7b0454
Select the appropriate dependency when Bazel's `os:windows` constrain…
abau-g Nov 6, 2024
d60ebf7
[xDS e2e tests] use real xDS protos
markdroth Nov 7, 2024
a644115
[CI] Updated RBE Windows Image (MSVC 2022) (#38063)
veblush Nov 7, 2024
7b6a5be
[EventEngine] Improve Windows IOCP test: variable lifetimes (#38085)
drfloob Nov 7, 2024
0148b49
Automated rollback of commit 2eaaa9cbf7226054df83b19ddd6d530a2e8edb32.
Vignesh2208 Nov 7, 2024
0672a7a
[CI] Used dockcross/manylinux2014-aarch64 for aarach64 artifact docke…
veblush Nov 7, 2024
ff23f44
[github] mark tools/distrib/python/xds_protos as auto-generated (#38083)
markdroth Nov 8, 2024
6db0a2e
[RetryFilter] Copy the SliceBuffer from RetryFilter's cache to batch …
yijiem Nov 8, 2024
740d219
[PHP] Fix flaky MacOS tests (#38090)
ajinkyakulkarni75 Nov 8, 2024
5fcc5f8
[Chttp2Transport] Flush data out over the transport quickly under hig…
Vignesh2208 Nov 8, 2024
da58cff
Cleanup gRPC's protobuf usage within Google
ctiller Nov 8, 2024
1be5e4e
[work-serializer] Enable `work_serializer_dispatch` everywhere (#38054)
ctiller Nov 9, 2024
c367ab1
[build] Fix it (#38095)
ctiller Nov 9, 2024
dc9af5a
reducing number of tooling tests to one (#37848)
sourabhsinghs Nov 11, 2024
e3039bc
Add public target that exports grpc_slice headers
Nov 11, 2024
be472f1
[PH2] New Experiments (#38103)
tanvi-jagtap Nov 12, 2024
8342a10
Fix python typos (#38029)
NathanBaulch Nov 12, 2024
630d790
[chaotic-good] Revamp wire format (#37765)
ctiller Nov 12, 2024
06f61ab
[http1] fix HttpRequest to support query params (#38099)
markdroth Nov 12, 2024
602c3ac
[xds e2e tests] apply test slowdown factor for does-not-exist timeout…
markdroth Nov 12, 2024
2159447
[testing] set -O1 in msan builds (#38118)
apolcyn Nov 13, 2024
5c09060
bump timeout for grpc_bazel_rbe_nonbazel from 90 minutes to 2 hours
apolcyn Nov 13, 2024
06b2452
Add metadata type for W3C traceparent header
yashykt Nov 14, 2024
c0f22d1
[StatsPlugin] Use lock-free list for global stats plugins list (#38060)
yashykt Nov 14, 2024
9751fab
[test] Re-enable end2end test on Windows now that EE is rolled out (#…
drfloob Nov 14, 2024
ed7854e
[test] remove unused file (#38109)
ctiller Nov 14, 2024
a0d9ddf
[latent-see] Improve visibility of party wakeups (#38053)
ctiller Nov 14, 2024
f3d00ac
[test] Remove unused script (#38110)
ctiller Nov 14, 2024
d0a7c33
[httpcli_test_util] clean up and modernize logic (#38121)
markdroth Nov 14, 2024
35e1bfa
[xDS] add auto_host_rewrite to human-readable form of route config (#…
markdroth Nov 14, 2024
94cbb67
Automated rollback of commit 06b2452feb982cb9fb9a035017208f2ce002adcb.
yashykt Nov 14, 2024
e352e89
[c-ares] Fix inverted length check in GrpcPolledFdWindows (#38101)
yijiem Nov 14, 2024
ef9e350
[fuzzing] Add a define that we can leverage to choose different codep…
ctiller Nov 15, 2024
551499c
set bazel --test_timeout in grpc_bazel_rbe_nonbazel to 1.5 hours
apolcyn Nov 15, 2024
10fa208
[chaotic-good] ensure client transport advertises shutdown (#38134)
ctiller Nov 15, 2024
9166bb9
[metrics] Fix test flakiness (#38128)
yashykt Nov 15, 2024
f9e372b
Automated rollback of commit 94cbb6760858f5fd223d114afdc66b30a8dadda8.
yashykt Nov 15, 2024
3f4f949
[client-channel] log formatting cleanup (#38133)
ctiller Nov 15, 2024
83380d2
[mpsc] Reads should fail on read closed (#38138)
ctiller Nov 15, 2024
4c48dee
[cancel_after_invoke] Additional corpora (#38132)
ctiller Nov 16, 2024
bc35dc1
[promise] better visibility into seqs (#38135)
ctiller Nov 16, 2024
b1890d8
Make grpc++_test target public
yashykt Nov 16, 2024
7f535a6
[promises] increase debuggability of loop (#38137)
ctiller Nov 17, 2024
6c37069
[chaotic-good] Fix recursive mutex deadlock (#38150)
ctiller Nov 18, 2024
3cc611c
[party] use ee for max threadyness (#38139)
ctiller Nov 18, 2024
45dacbe
[interop] Add v1.65.1, v1.66.3, v1.67.1, v1.68.0 releases of grpc-go …
purnesh42H Nov 18, 2024
a55c066
[Deps] Updated OpenTelemetry to the HEAD (#38140)
veblush Nov 18, 2024
14e077f
Extend chaotic_good_legacy_protocol expiry date
ctiller Nov 18, 2024
d61d88d
Revert "[party] use ee for max threadyness (#38139)" (#38153)
ctiller Nov 19, 2024
4eb73bc
[flake] Fix recursive mutex issue in legacy chaotic good (#38156)
ctiller Nov 20, 2024
cb93754
[CI] Keep ninja test only for Windows. (#38159)
veblush Nov 20, 2024
bcc04e7
[Orca Service] Gracefully fail the incoming RPC if it fails deseriali…
Vignesh2208 Nov 20, 2024
9801f6d
[deps] Upgrade google/benchmark to v1.9.0 (#38163)
drfloob Nov 20, 2024
bef33bd
increased timeout for windows grpc basictests python (#38162)
sourabhsinghs Nov 20, 2024
67d82ec
[OnCall] Minor change to Python binary metadata documentation (#38127)
tanvi-jagtap Nov 21, 2024
a5703a0
[pick_first] fix shutdown bug in new PF impl (#38144)
markdroth Nov 21, 2024
c333d60
[OTel] Set prometheus exporter option to populate otel scope (#38170)
yashykt Nov 21, 2024
394118d
[chaotic-good] Multi-connection support (#38032)
ctiller Nov 22, 2024
7570d8b
[reorg] move src/core/lib/config -> src/core/config (#37847)
markdroth Nov 22, 2024
c6dccd4
[fix] Add missing corpus for fuzzer (#38175)
ctiller Nov 22, 2024
7826ddc
interop-testing: update the Interop-test-descriptions doc to reflect …
zbilun Nov 22, 2024
cfda657
[CI] Added a gRPC_BUILD_TESTS guard to third_party protos (#38179)
veblush Nov 22, 2024
d1035ba
[chaotic-good] Fix timeout in test suite (#38177)
ctiller Nov 23, 2024
8e78a1a
[BmPicker] Reduce benchmark iterations for tsan builds to avoid timeo…
Vignesh2208 Nov 25, 2024
cccd74c
[StatsPlugingTest] Fix flakiness by increasing sleep duration in test…
Vignesh2208 Nov 25, 2024
989fed6
[CSM] Use xds-enabled server and xds credentials in examples (#38192)
yashykt Nov 26, 2024
c390a8b
[benchmark] Reenable OSS benchmarks for dotnet. (#38186)
paulosjca Nov 26, 2024
36e534d
[Examples] Remove grpcpp_admin dependency (#38196)
yashykt Nov 26, 2024
40b867f
[xDS RBAC] Support string_match in HeaderMatcher (#38185)
yashykt Nov 26, 2024
69d325c
[experiments] Extend expiry dates of experiments.
Vignesh2208 Nov 26, 2024
fb03c50
[Http2] Gathering stats related to peer's flow control characteristic…
Vignesh2208 Nov 26, 2024
a428e2c
[CSM Observability] Add option to use Xds server (#38194)
XuanWang-Amos Nov 26, 2024
18feac3
[benchmark] Update image prefix for PSM benchmarks. (#38200)
paulosjca Nov 26, 2024
9d9a89a
[PH2][Refactor][Minor] Remove magic number for clarity (#38166)
tanvi-jagtap Nov 27, 2024
e414257
[resource_quota] Add global control knob for container memory usage (…
ctiller Dec 2, 2024
9d26956
[experiments] Extend expiry of time_caching_in_party (#38211)
ctiller Dec 2, 2024
05146d8
[chttp2] Allow clients to reject on too many streams (#38176)
ctiller Dec 2, 2024
9651af5
[Alarm] Fix Alarm reuse on cancellation (#38114)
yijiem Dec 2, 2024
721ab4f
[party] Add a hold mechanism, eliminate bulk spawns (#38149)
ctiller Dec 2, 2024
59691b0
[Build] Upgraded Clang 19 (#38038)
veblush Dec 2, 2024
091d9cb
[party] Fix multiple wakeup bug (#38213)
ctiller Dec 2, 2024
7819891
[Deps] Updated protobuf to v29 (#38066)
veblush Dec 3, 2024
adeb465
Automated rollback of commit 9651af5b436d1cf37753f29e2480af63849391c0.
ctiller Dec 3, 2024
84aca4f
[promises] Allow void returns (#38217)
ctiller Dec 3, 2024
e9c16ac
[fuzzing-event-engine] Remove time sweeps (#37889)
ctiller Dec 3, 2024
c5735cf
[chaotic-good] Disable e2e tests on Windows (#38218)
ctiller Dec 4, 2024
e15fd5a
[xds] Add a test case for "unhealthy" endpoint status (#38225)
eugeneo Dec 4, 2024
c741a56
[release] Bump core version (#38224)
markdroth Dec 4, 2024
cb4172d
[reorg] move json and http_client tests to util (#38120)
markdroth Dec 5, 2024
c387134
[ObjC] Updated Abseil to 1.20240722.0 (#38214)
HannahShiSFB Dec 5, 2024
8bf7149
[Test] Fix logging test (#38232)
veblush Dec 5, 2024
a5ed6fa
[ObjC] Fix address sorting issue (#38215)
HannahShiSFB Dec 5, 2024
cde3276
[release] Bump dev version on master to 1.70.0-dev (#38236)
markdroth Dec 5, 2024
06a7463
[LB unit tests] use only FuzzingEventEngine (#37744)
markdroth Dec 5, 2024
2980de4
[util] Add t-digest implementation (#38229)
ctiller Dec 6, 2024
1ef140c
[chaotic-good] Chunked messages (#38052)
ctiller Dec 6, 2024
b3a44f1
[backoff] cap initial backoff at max backoff (#38239)
markdroth Dec 6, 2024
32f5136
Clarify the concurrency of OnDone with respect to the reactor, and ca…
tkoeppe Dec 6, 2024
5513993
[Test] Add CallbackSetInCallback test case (#38234)
yijiem Dec 6, 2024
fe8bd94
[chttp2] Prioritize sending out finished requests over other in fligh…
Vignesh2208 Dec 6, 2024
87b60ce
Update compression_cookbook.md (#37856)
henrytien Dec 9, 2024
c61456b
[call-v3] Retry Interceptor (#37816)
ctiller Dec 9, 2024
8df11e2
[experiments] Extend privacy experiments temporarily (#38252)
ananda1066 Dec 9, 2024
f36c6ae
[util] Kolmogorov-Smirnov test approximated over T-Digests (#38245)
ctiller Dec 9, 2024
2aa4aa2
[Test] Fix error_details_test (#38233)
veblush Dec 10, 2024
190fdf1
decrease concurrency for windows grpc_distribtests_python (#38161)
sourabhsinghs Dec 10, 2024
cd57733
Revert "increased timeout for windows grpc basictests python" (#38174)
sourabhsinghs Dec 10, 2024
25813a0
[call-v3] Put retries under an experiment (#38255)
ctiller Dec 10, 2024
cb5563c
Automated rollback of commit f36c6aed760b739cb8d1ff8375f756de4f8c7dca.
ctiller Dec 10, 2024
4309f97
[chaotic-good] Lazy connection establishment (#38244)
ctiller Dec 10, 2024
eef07f8
[Ruby] increase ruby test timeout (#38258)
alto-ruby Dec 11, 2024
5b3709b
[PH2] Add new debug only trace flag (#38250)
tanvi-jagtap Dec 11, 2024
39a3889
[call-v3] Actually experimentalize retries (#38268)
ctiller Dec 11, 2024
cf81cce
[Build] Prerequisites to Bazel 8 upgrade (#38261)
veblush Dec 11, 2024
1771592
[Build] Upgraded Bazel to 7.4.1 (#38262)
veblush Dec 11, 2024
6ca7889
[tdigest] Lessen stringency on a test in the middle of the range (#38…
ctiller Dec 11, 2024
bac8e34
[chaotic-good] Cleanup (#38273)
ctiller Dec 12, 2024
b4605e6
[alts] Reduce logging frequency when the ALTS handshaker service retu…
tanvi-jagtap Dec 12, 2024
4b87a16
[dns] Small tweak to readability of dns resolver code (#38253)
ctiller Dec 12, 2024
11b971e
[atm] Remove gpr_atm_no_barrier_clamped_add (#38263)
ctiller Dec 12, 2024
4e0c1d4
[xDS-enabled server] fix status code when RDS resource doesn't exist …
markdroth Dec 13, 2024
0498194
Fix lifetime issue with EventEngine endpoint wrapper. Before, we woul…
ananda1066 Dec 13, 2024
782814e
[chaotic-good] Hide log line behind tracer (#38285)
ctiller Dec 14, 2024
b82b79b
[Python CSM] Change example to use Helloworld service (#38283)
XuanWang-Amos Dec 16, 2024
62e9e44
[Python Observability] update observability example (#38272)
XuanWang-Amos Dec 16, 2024
f502e75
[BoringSSL] Update third_party/boringssl-with-bazel (#38274)
veblush Dec 17, 2024
9a10ab9
[PH2][Documentation][Promise] Document the Join combinator
tanvi-jagtap Dec 17, 2024
d2615ff
[interop] Add grpc-java 1.68.2 and 1.69.0 to client_matrix.py (#38292)
kannanjgithub Dec 17, 2024
a04e2f5
[PH2][Promise][Test] Test for if combinator (#38294)
tanvi-jagtap Dec 17, 2024
d04a97b
[Reland] [EventEngine] Migrate httpcli to use EventEngine DNSResolver…
yijiem Dec 17, 2024
3f001f7
Use Label() instead of specifying the repo name (#38280)
mering Dec 17, 2024
4e65d13
Remove privacy experiments and update expirations (#38299)
ananda1066 Dec 17, 2024
0b9ff8e
[experiments] Fix expiration date (#38310)
ananda1066 Dec 18, 2024
1d49120
[experiments] Verify expiration date makes sense (#38301)
ctiller Dec 18, 2024
4a72b88
[Dep] Updated opentelemetry-cpp to 1.18.0 (#38317)
veblush Dec 18, 2024
c4ce477
[Ruby] update bundle before building the native lib (#38303)
alto-ruby Dec 19, 2024
8184d79
[CI] Updated Bazel to 7.4.1 in rbe_ubuntu2004 (#38318)
veblush Dec 19, 2024
bb8fc89
[experiments] Actually increment the year... (#38322)
ctiller Dec 19, 2024
24d3d0c
[Dep] Updated google-cloud-cpp-2.33.0 (#38321)
veblush Dec 19, 2024
01673b0
[CI] Updated Bazel to 7.4.1 in rbe_windows (#38320)
veblush Dec 19, 2024
6fa8043
[ring_hash] allow use without xDS, and allow setting endpoint hash ke…
markdroth Dec 19, 2024
b53f405
Automated rollback of commit 6fa8043bf9befb070b846993b59a3348248e6566.
markdroth Dec 20, 2024
8250115
[Ruby] do not use static thread local member in scoped activity (#34563)
alto-ruby Dec 21, 2024
93b2960
[C++] Transition to C++17 (#37919)
veblush Dec 23, 2024
54cab7a
[ring_hash] second attempt: allow use without xDS, and allow setting …
markdroth Dec 23, 2024
9b77138
examples/cpp/route_guide add missing command line parsing (#37857)
RomantsovS Dec 23, 2024
6768331
[xDS e2e tests] refactor some common code into a shared utility metho…
markdroth Dec 27, 2024
b691026
[PSM interop client] add flag to log RPC start and end (#38355)
markdroth Dec 27, 2024
bbefef3
[XdsClient] move ADS encoding and decoding directly into XdsClient (#…
markdroth Dec 27, 2024
c4aa5c4
[xds_client_test] remove now-unnecessary override of timer duration (…
markdroth Dec 27, 2024
c99406f
Added pawbhard as maintainer (#38352)
pawbhard Dec 30, 2024
f39cd0f
[benchmark] Skip broken benchmark on 8 cores. (#38350)
paulosjca Dec 30, 2024
097bc74
[XdsClient] test that we retain the nonce even after unsubscribing fr…
markdroth Jan 2, 2025
42af58b
Replace stddef.h with cstddef (#38289)
atetubou Jan 2, 2025
c1b86d5
[XdsClient] update watcher API (#38269)
markdroth Jan 2, 2025
66b478f
[EventEngine] Track TCP global stats in the PosixEventEngine (#34104)
drfloob Jan 2, 2025
fb34949
[Gpr_To_Absl_Logging] Log noise reduction (#38330)
tanvi-jagtap Jan 3, 2025
9dd6f26
[CI] Examples test sanity (#38369)
pawbhard Jan 3, 2025
c9337c7
[xDS] include resolution_note in all error messages coming from PF an…
markdroth Jan 3, 2025
8caabd5
[benchmark] Log global stats at the end of the QPS benchmarks (#34105)
drfloob Jan 3, 2025
211f4c0
[EventEngine] Add experiment to disable KeepsGrpcInitialized for the …
drfloob Jan 3, 2025
ae9b216
[xDS] update to the latest xDS protos (#38379)
markdroth Jan 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[pick_first] fix shutdown bug in new PF impl (grpc#38144)
The bug occurs in the following fairly specific sequence of events:

1. PF gets a resolver update with two or more addresses.  It starts connecting to the first address and starts a Happy Eyeballs timer for 250ms.
   - Note that the timer holds a ref to the `SubchannelList`, which is necessary to trigger the bug below.  If there was only one address, there would be no Happy Eyeballs timer holding a ref here, so the bug would not occur.
2. The first subchannel reports CONNECTING and is seen by the LB policy.
3. The first subchannel reports READY, and the notification hops into the WorkSerializer but has not yet been executed.
4. The timer fires, and the timer callback hops into the WorkSerializer but has not yet been executed.
5. The LB policy gets shut down.  This shuts down the `SubchannelList`, but we fail to actually shut down the underlying `SubchannelState`.
   - This is the bug!  We *should* be shutting down the `SubchannelState` here.
   - Note that if the pending timer callback were not holding a ref to the `SubchannelList`, then the bug would not occur: the `SubchannelList` would have been immediately destroyed, which *would* have shut down the `SubchannelState`.  In particular, note that if the timer had not yet fired, shutting down the `SubchannelList` would cancel the timer, thus releasing the ref immediately and shutting down the `SubchannelState`.  Similarly, if the timer callback had already been seen by the LB policy, then the ref would also no longer be held.
6. The LB policy now sees the READY notification.  This should be a no-op, since PF has already been shut down.  However, because the `SubchannelState` was not shut down, it selects the subchannel instead.
7. The LB policy now sees the timer fire.  This becomes a no-op, but it releases the ref to the `SubchannelList`, thus causing the `SubchannelList` to be destroyed.  However, the `SubchannelState` for the selected subchannel from the previous step is no longer owned by the `SubchannelList`, so it is not shut down.
8. The selected subchannel now reports IDLE.  This causes PF to call `GoIdle()`, and at this point we are holding the last ref to the LB policy, which we try to access after giving up that ref, thus causing a crash.
   - Note that we're not actually holding this ref in order to keep the LB policy alive at this point; the ref actually exists only due to some [tech debt](https://github.com/grpc/grpc/blob/14e077f9bd4444ef5417b20ad05bfa64e4b5929e/src/core/load_balancing/pick_first/pick_first.cc#L196).  We should never be executing this code path to begin with after PF has been shut down, so we shouldn't need that ref.

Closes grpc#38144

COPYBARA_INTEGRATE_REVIEW=grpc#38144 from markdroth:pick_first_new_fix 4ec9f9e
PiperOrigin-RevId: 698807898
  • Loading branch information
markdroth authored and copybara-github committed Nov 21, 2024
commit a5703a0693b0427f656c60cd09d172fd5968a99c
17 changes: 10 additions & 7 deletions src/core/load_balancing/pick_first/pick_first.cc
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,7 @@ PickFirst::PickFirst(Args args)

PickFirst::~PickFirst() {
GRPC_TRACE_LOG(pick_first, INFO) << "Destroying Pick First " << this;
CHECK(subchannel_list_ == nullptr);
CHECK_EQ(subchannel_list_.get(), nullptr);
}

void PickFirst::ShutdownLocked() {
Expand Down Expand Up @@ -744,6 +744,8 @@ void PickFirst::SubchannelList::SubchannelData::SubchannelState::
// If we're still part of a subchannel list trying to connect, check
// if we're connected.
if (subchannel_data_ != nullptr) {
CHECK_EQ(pick_first_->subchannel_list_.get(),
subchannel_data_->subchannel_list_);
// If the subchannel is READY, use it.
// Otherwise, tell the subchannel list to keep trying.
if (new_state == GRPC_CHANNEL_READY) {
Expand All @@ -754,7 +756,7 @@ void PickFirst::SubchannelList::SubchannelData::SubchannelState::
return;
}
// We aren't trying to connect, so we must be the selected subchannel.
CHECK(pick_first_->selected_.get() == this);
CHECK_EQ(pick_first_->selected_.get(), this);
GRPC_TRACE_LOG(pick_first, INFO)
<< "Pick First " << pick_first_.get()
<< " selected subchannel connectivity changed to "
Expand Down Expand Up @@ -803,15 +805,14 @@ void PickFirst::SubchannelList::SubchannelData::OnConnectivityStateChange(
<< ", p->subchannel_list_=" << p->subchannel_list_.get()
<< ", p->subchannel_list_->shutting_down_="
<< p->subchannel_list_->shutting_down_;

if (subchannel_list_->shutting_down_) return;
// The notification must be for a subchannel in the current list.
CHECK(subchannel_list_ == p->subchannel_list_.get());
CHECK_EQ(subchannel_list_, p->subchannel_list_.get());
// SHUTDOWN should never happen.
CHECK(new_state != GRPC_CHANNEL_SHUTDOWN);
CHECK_NE(new_state, GRPC_CHANNEL_SHUTDOWN);
// READY should be caught by SubchannelState, in which case it will
// not call us in the first place.
CHECK(new_state != GRPC_CHANNEL_READY);
CHECK_NE(new_state, GRPC_CHANNEL_READY);
// Update state.
absl::optional<grpc_connectivity_state> old_state = connectivity_state_;
connectivity_state_ = new_state;
Expand Down Expand Up @@ -935,7 +936,7 @@ void PickFirst::SubchannelList::SubchannelData::RequestConnectionWithTimer() {
if (connectivity_state_ == GRPC_CHANNEL_IDLE) {
subchannel_state_->RequestConnection();
} else {
CHECK(connectivity_state_ == GRPC_CHANNEL_CONNECTING);
CHECK_EQ(connectivity_state_.value(), GRPC_CHANNEL_CONNECTING);
}
// If this is not the last subchannel in the list, start the timer.
if (index_ != subchannel_list_->size() - 1) {
Expand Down Expand Up @@ -1021,6 +1022,8 @@ void PickFirst::SubchannelList::Orphan() {
<< "[PF " << policy_.get() << "] Shutting down subchannel_list " << this;
CHECK(!shutting_down_);
shutting_down_ = true;
// Shut down subchannels.
subchannels_.clear();
// Cancel Happy Eyeballs timer, if any.
if (timer_handle_.has_value()) {
policy_->channel_control_helper()->GetEventEngine()->Cancel(*timer_handle_);
Expand Down
40 changes: 25 additions & 15 deletions test/core/load_balancing/lb_policy_test_lib.h
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
#include "src/core/lib/channel/channel_args.h"
#include "src/core/lib/config/core_configuration.h"
#include "src/core/lib/event_engine/default_event_engine.h"
#include "src/core/lib/experiments/experiments.h"
#include "src/core/lib/iomgr/exec_ctx.h"
#include "src/core/lib/iomgr/resolved_address.h"
#include "src/core/lib/security/credentials/credentials.h"
Expand Down Expand Up @@ -92,6 +93,9 @@ namespace testing {

class LoadBalancingPolicyTest : public ::testing::Test {
protected:
using FuzzingEventEngine =
grpc_event_engine::experimental::FuzzingEventEngine;

using CallAttributes =
std::vector<ServiceConfigCallData::CallAttributeInterface*>;

Expand Down Expand Up @@ -573,7 +577,9 @@ class LoadBalancingPolicyTest : public ::testing::Test {
MutexLock lock(&mu_);
StateUpdate update{
state, status,
MakeRefCounted<PickerWrapper>(test_, std::move(picker))};
IsWorkSerializerDispatchEnabled()
? std::move(picker)
: MakeRefCounted<PickerWrapper>(test_, std::move(picker))};
LOG(INFO) << "enqueuing state update from LB policy: "
<< update.ToString();
queue_.push_back(std::move(update));
Expand Down Expand Up @@ -698,10 +704,7 @@ class LoadBalancingPolicyTest : public ::testing::Test {
// Order is important here: Fuzzing EE needs to be created before
// grpc_init(), and the POSIX EE (which is used by the WorkSerializer)
// needs to be created after grpc_init().
fuzzing_ee_ =
std::make_shared<grpc_event_engine::experimental::FuzzingEventEngine>(
grpc_event_engine::experimental::FuzzingEventEngine::Options(),
fuzzing_event_engine::Actions());
fuzzing_ee_ = MakeFuzzingEventEngine();
grpc_init();
event_engine_ = grpc_event_engine::experimental::GetDefaultEventEngine();
work_serializer_ = std::make_shared<WorkSerializer>(event_engine_);
Expand All @@ -723,14 +726,16 @@ class LoadBalancingPolicyTest : public ::testing::Test {
WaitForWorkSerializerToFlush();
work_serializer_.reset();
exec_ctx.Flush();
// Note: Can't safely trigger this from inside the FakeHelper dtor,
// because if there is a picker in the queue that is holding a ref
// to the LB policy, that will prevent the LB policy from being
// destroyed, and therefore the FakeHelper will not be destroyed.
// (This will cause an ASAN failure, but it will not display the
// queued events, so the failure will be harder to diagnose.)
helper_->ExpectQueueEmpty();
lb_policy_.reset();
if (lb_policy_ != nullptr) {
// Note: Can't safely trigger this from inside the FakeHelper dtor,
// because if there is a picker in the queue that is holding a ref
// to the LB policy, that will prevent the LB policy from being
// destroyed, and therefore the FakeHelper will not be destroyed.
// (This will cause an ASAN failure, but it will not display the
// queued events, so the failure will be harder to diagnose.)
helper_->ExpectQueueEmpty();
lb_policy_.reset();
}
fuzzing_ee_->TickUntilIdle();
grpc_event_engine::experimental::WaitForSingleOwner(
std::move(event_engine_));
Expand All @@ -739,6 +744,12 @@ class LoadBalancingPolicyTest : public ::testing::Test {
fuzzing_ee_.reset();
}

virtual std::shared_ptr<FuzzingEventEngine> MakeFuzzingEventEngine() {
return std::make_shared<FuzzingEventEngine>(
grpc_event_engine::experimental::FuzzingEventEngine::Options(),
fuzzing_event_engine::Actions());
}

LoadBalancingPolicy* lb_policy() const {
CHECK(lb_policy_ != nullptr);
return lb_policy_.get();
Expand Down Expand Up @@ -1465,8 +1476,7 @@ class LoadBalancingPolicyTest : public ::testing::Test {
}
}

std::shared_ptr<grpc_event_engine::experimental::FuzzingEventEngine>
fuzzing_ee_;
std::shared_ptr<FuzzingEventEngine> fuzzing_ee_;
// TODO(ctiller): this is a normal event engine, yet it gets its time measure
// from fuzzing_ee_ -- results are likely to be a little funky, but seem to do
// well enough for the tests we have today.
Expand Down
68 changes: 67 additions & 1 deletion test/core/load_balancing/pick_first_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ class PickFirstTest : public LoadBalancingPolicyTest {
}

// Gets order the addresses are being picked. Return type is void so
// assertions can be used
// assertions can be used.
void GetOrderAddressesArePicked(
absl::Span<const absl::string_view> addresses,
std::vector<absl::string_view>* out_address_order) {
Expand Down Expand Up @@ -1172,6 +1172,72 @@ TEST_F(PickFirstTest, AddressUpdateRetainsSelectedAddress) {
EXPECT_FALSE(subchannel2->ConnectionRequested());
}

// DO NOT USE!
//
// A test class that overrides the FuzzingEventEngine to make timer
// cancellation always fail. This is used to simulate cases where, at
// the moment that the timer is cancelled, the timer has already fired
// but the timer callback has not yet run in the WorkSerializer.
//
// TODO(roth): This is a really ugly hack. As part of changing these
// tests to use the FuzzingEventEngine exclusively, we should instead
// find a way to tick the FuzzingEventEngine to the right point so that
// we don't need this ugliness.
class PickFirstNoCancelTimerTest : public PickFirstTest {
protected:
class FuzzingEventEngineWithoutTimerCancellation : public FuzzingEventEngine {
public:
using FuzzingEventEngine::FuzzingEventEngine;

bool Cancel(TaskHandle) override { return false; }
};

std::shared_ptr<FuzzingEventEngine> MakeFuzzingEventEngine() override {
return std::make_shared<FuzzingEventEngineWithoutTimerCancellation>(
grpc_event_engine::experimental::FuzzingEventEngine::Options(),
fuzzing_event_engine::Actions());
}
};

// This exercizes a bug seen in the wild that caused a crash. For
// details, see https://github.com/grpc/grpc/pull/38144.
TEST_F(PickFirstNoCancelTimerTest, SubchannelNotificationAfterShutdown) {
// Send an update containing one address.
constexpr std::array<absl::string_view, 2> kAddresses = {
"ipv4:127.0.0.1:443", "ipv4:127.0.0.1:444"};
absl::Status status = ApplyUpdate(
BuildUpdate(kAddresses, MakePickFirstConfig(false)), lb_policy());
EXPECT_TRUE(status.ok()) << status;
// LB policy should have created a subchannel for each address.
auto* subchannel = FindSubchannel(kAddresses[0]);
ASSERT_NE(subchannel, nullptr);
auto* subchannel2 = FindSubchannel(kAddresses[1]);
ASSERT_NE(subchannel2, nullptr);
// When the LB policy receives the first subchannel's initial connectivity
// state notification (IDLE), it will request a connection.
EXPECT_TRUE(subchannel->ConnectionRequested());
// This causes the subchannel to start to connect, so it reports CONNECTING.
subchannel->SetConnectivityState(GRPC_CHANNEL_CONNECTING);
// LB policy should have reported CONNECTING state.
ExpectConnectingUpdate();
// Now shut down the LB policy.
// This will cancel the Happy Eyeballs timer, but since we're using a
// FuzzingEventEngine that fails timer cancellations, it simulates the
// case where the timer has already fired but the timer callback has
// not yet run inside the WorkSerializer.
lb_policy_.reset();
// Now the subchannel reports READY. Before the bug fix, this caused
// us to select the subchannel instead of ignoring the notification.
// With the bug fix, this update should never actually be delivered to
// the LB policy, since it will have already shut down the subchannel.
subchannel->SetConnectivityState(GRPC_CHANNEL_READY);
// Now trigger the Happy Eyeballs timer to fire.
IncrementTimeBy(Duration::Milliseconds(250));
// Now the subchannel reports IDLE. Before the bug fix, this
// triggered a crash.
subchannel->SetConnectivityState(GRPC_CHANNEL_IDLE);
}

TEST_F(PickFirstTest, WithShuffle) {
constexpr std::array<absl::string_view, 6> kAddresses = {
"ipv4:127.0.0.1:443", "ipv4:127.0.0.1:444", "ipv4:127.0.0.1:445",
Expand Down