Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chassis-packet] Change the Orchagent redis pop batch size to 128 to handle link notification faster. #21771

Merged
merged 66 commits into from
Feb 20, 2025

Conversation

abdosi
Copy link
Contributor

@abdosi abdosi commented Feb 18, 2025

What I did:
Change the Orchagent redis pop batch size to 128 to handle link notification faster.

Why I did:
As part of Ixia BGP convergence test https://github.com/sonic-net/sonic-mgmt/blob/master/tests/snappi_tests/multidut/bgp/test_bgp_outbound_uplink_multi_po_flap.py we found that because of SAI programming slowness which is around 1500 Routes/sec. [Takes about approx 40sec+/- to program 60K routes across multiple iteration]
Because of above slowness even if we have Link Notification available Orchagent will not process it immediately as current OA will process 1024 Entries (Route entries in our case) before it can pick Link Notification for processing. Now 1K entries can take about 2 sec+/- and if link notification are little spread out [not back 2 back] we can have batch of 1K entries which accumulate SAI delay of 2 sec.

To optimize Link processing and give more chance to OA to pick Link Notification we reduce OA processing to 128 entries and this helped to reduce convergence time to about overall 2 sec.

Changing OA processing from 1K to 128 entries does not have any impact of Route Programming as SAI slowness seems be tied with sequential processing at 1500 Routes per/sec. However this is helpful in processing Link Notification quicker. For our test it reduce Convergence time from about 12-15sec to 2 sec.

How I verify:

Ran above ixia test in multiple iteration with both 1024 and 128 pop batch size and compared the performance.

abdosi and others added 30 commits August 3, 2023 04:47
Signed-off-by: Abhishek Dosi <[email protected]>
higher value so that BGP learnt default route is higher priority.

Signed-off-by: Abhishek Dosi <[email protected]>
save as `slice_type` as part of DEVICE_METADATA

Signed-off-by: Abhishek Dosi <[email protected]>
 save as `slice_type` as part of DEVICE_METADATA for Chassis Device type

Signed-off-by: Abhishek Dosi <[email protected]>
pmon need to enable asap to detect ASIC's on Supervisor.
pmonm need to enable asap for bring-up of 400G ports on LC's fast
becuase of CMIS state machine present in PMON.

Signed-off-by: Abhishek Dosi <[email protected]>
@abdosi
Copy link
Contributor Author

abdosi commented Feb 18, 2025

@anamehra for viz.

@abdosi
Copy link
Contributor Author

abdosi commented Feb 18, 2025

@rlhui for viz.

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@abdosi abdosi requested review from liuh-80 and removed request for ZhaohuiS February 18, 2025 18:41
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@abdosi
Copy link
Contributor Author

abdosi commented Feb 18, 2025

@yejianquan for viz.

Copy link

@deepak0408-eng deepak0408-eng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abdosi
Copy link
Contributor Author

abdosi commented Feb 19, 2025

@rlhui : Can you help merge this.

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #21803

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to msft-202405: Azure/sonic-buildimage-msft#649

ram25794 pushed a commit to ram25794/sonic-buildimage that referenced this pull request Feb 21, 2025
…handle link notification faster. (sonic-net#21771)

What I did:
Change the Orchagent redis pop batch size to 128 to handle link notification faster.

Why I did:
As part of Ixia BGP convergence test https://github.com/sonic-net/sonic-mgmt/blob/master/tests/snappi_tests/multidut/bgp/test_bgp_outbound_uplink_multi_po_flap.py we found that because of SAI programming slowness which is around 1500 Routes/sec. [Takes about approx 40sec+/- to program 60K routes across multiple iteration]
Because of above slowness even if we have Link Notification available Orchagent will not process it immediately as current OA will process 1024 Entries (Route entries in our case) before it can pick Link Notification for processing. Now 1K entries can take about 2 sec+/- and if link notification are little spread out [not back 2 back] we can have batch of 1K entries which accumulate SAI delay of 2 sec.

To optimize Link processing and give more chance to OA to pick Link Notification we reduce OA processing to 128 entries and this helped to reduce convergence time to about overall 2 sec.

Changing OA processing from 1K to 128 entries does not have any impact of Route Programming as SAI slowness seems be tied with sequential processing at 1500 Routes per/sec. However this is helpful in processing Link Notification quicker. For our test it reduce Convergence time from about 12-15sec to 2 sec.

How I verify:

Ran above ixia test in multiple iteration with both 1024 and 128 pop batch size and compared the performance.
---------

Signed-off-by: Abhishek Dosi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

9 participants