Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duration most times zero and lots of failed procedures #5

Open
bluemelov1 opened this issue Nov 10, 2024 · 6 comments
Open

Duration most times zero and lots of failed procedures #5

bluemelov1 opened this issue Nov 10, 2024 · 6 comments

Comments

@bluemelov1
Copy link
Contributor

bluemelov1 commented Nov 10, 2024

I am currently facing the issue that the duration is always 0.0 seconds. I just had once a run where the duration was 128 seconds. But alle the other tests gave no result? What could be the problem here?

I've also noticed that often the procedures (I just use registration and de-registration) fail or the core network itself fails, resulting in a restart of the AMF. Are there ways to prevent this?

Thanks a lot!

This is an example statistic:
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Item ┃ Results ┃
┣━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃ Duration ┃ 0.0 seconds ┃
┃ N# of UEs ┃ 20 ┃
┃ Successful procedures ┃ 18 UEs ┃
┃ Failed procedures ┃ 2 UEs ┃
┃ Min interval ┃ 0.0 seconds ┃
┃ Avg interval ┃ 11.924201117621529 seconds ┃
┃ Max interval ┃ 22.10416030883789 seconds ┃
┗━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

@tariromukute
Copy link
Owner

Hi @bluemelov1, apologies for the delayed response. I will look into the duration issue, I noticed it's always zero. I see you are having 18 out of 20 UEs, which doesn't seem to be a lots. Can you run with log level of 5, i.e., -vvvvv. This will print the procedures that failed. Please share and the file and I will look into it.

Also, which core network are you working with?

@bluemelov1
Copy link
Contributor Author

Hello @tariromukute,

Thanks for your response! I am using the OAI core network for my benchmark. I tried to put a service mesh on top of the OAI core network to have encryption between the different NFs. So far it works but is far more likely to crash.

I tried to use verbosity level of 4 -vvvv or 5 -vvvvv but there was always this error seen in the log file here that the compliance mapper does not recognize this key: "KeyError: '5GMMRegistrationRequest'"
log_vvvv.txt

Unfortunately I couldn't manage to get another result with 2 UEs out of 20 failing. Either all 20 worked or didn't work. So I have provided you with both the logs. Thank you for looking into it!

log_vvv_fail.txt
log_vvv.txt

@tariromukute
Copy link
Owner

Hi @bluemelov1 I checked the log_vvv_fail.txt file. it looks like the Registration Request is sent but the OAI core network does not respond. Can you please share the following:

  1. The logs for the OAI core network especially, oai-amf.
  2. Share the manifest file to reproduce your setup i.e., either docker compose and it's respective config or Kubernetes deployment files etc.

Also I see that can you try for 10 UEs or less and see if you get the same behavior.

@tariromukute
Copy link
Owner

I think I may have found what the issue is. The mysql database might take some time to finish initialising. So in some cases it maybe that when the traffic generator sends the Registration requests, when the UE details don't exist yet. If you check the logs for oai-ausf you might find an error for the UE not existing. If you are starting everything at once, this might be an issue. Try starting core-network-traffic-generator a minute after the OAI core network (likely less).

Let me know if that resolves your issue.

@bluemelov1
Copy link
Contributor Author

Hello @tariromukute, and sorry for the late response, I was occupied with other tasks.

First of all I use the OAI deployment for K8s especially the e2e_scenario/case1 from the official OAI Repository. Because I have adopted it to work with other K8s Resources from other helm charts it's difficult to share the whole structure of my project. A brief explanation is, that I used Envoy to build a service mesh on top of the core network to secure the communication inside of it. If you're really interested I'll open another repository for you to try it out, but it takes some time I guess.

As for your last tip, I have also found out that the database needs the most time to boot, therefore I have excluded it from the helm chart I use to deploy the CN and have it always running. Additionally my CN sometimes did not boot correctly therefore I've put a 10 seconds delay to boot each NF pod by its own. The trafficGenerator is executed manually through another helm chart. So I think that should be enough time for the CN to correctly set everything up.

I have done another test with 10 UEs which was successful and I have saved the logs of the trafficGernerator, AMF and UDR
tg2.log
udr2.log
amf2.log

Everything seems fine except for the following lines in the udr

[2024-12-12 10:41:16.108] [udr_db] [error] [UE Id 00101] No data available for AccessAndMobilitySubscriptionData!
[2024-12-12 10:41:16.112] [udr_app] [info] [UE Id 00101] Retrieve the access and mobility subscription data of an UE (ID 00101)

I don't get why it just wants to retrieve this data for one of the UEs and not the others. Anyway, this run was successful so I'm not sure how much attention I should give the error.

In some of the other runs I did, the AMF just crashes resulting in the traffic generator showing errors of no SCTP connection. I have appended you the AMF log where this log message catches my interest:

[2024-12-12 11:17:14.237] [system] [info] Caught signal 15

amf_crashed.log
But I'm not sure why the AMF starts shutting down everything afterwards. Do you have any idea why it sometimes happens and sometimes not?

Also I've tried to change the log level from debug to error for the OAI CN but that didn't solve the crash problem with 20 UEs.

@bluemelov1
Copy link
Contributor Author

I also found an error in the AUSF that says

[ausf_app] [error] Authentication failure by home network with authCtxId 11da20c0dba0800052cc0ebabd1b5578: AV expired

afterwards the AMF crashes and then the traffic generator fails. Do you think it's related to some kind of timeout for the authentication vector?

I also tried to play with the flags -i and -n for the traffic generator to slow down the TG so that the CN can handle each request by its own, but I could not see any changes. Could you explain what exactly they are doing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants