-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeated peer confirmation and kernelisation messages #1633
Comments
Looks like one of these ports was receiving media from two sources:
If you have something like |
Hi, Thank you for catching that! I was wondering if that was the case but wasn't sure how you determined that from the log.
|
Could such an outcome arise as a result of RTPEngine being shared among multiple, unrelated Kamailio instances? This is the case here. We have traced this port, :43562, to two relatively consecutive calls which were set off by Kamailio A and then Kamailio B. The flapping endpoints Since RTPEngine itself allocates its local ports, this was not expected to be a problem. Surely RTPEngine would not allocate a local port to two targets at once, even if they are the result of two different offer/answer commands? Or is this type of race condition possible if two Kamailios are connected to two different LWPs handling these commands, for example? I suppose the fundamental question is: is it safe to share RTPEngine among multiple Kamailio instances? I assumed it was. My reasoning was: since every Kamailio worker process makes dedicated sockets to communicate with RTPEngine, there would have to be some sort of mutex across multiple LWPs/threads for port allocations anyway, because not all commands are going to land on the same process (if there are multiple ones) even from a single Kamailio instance. |
Perhaps you're running into the same thing as #1634 ? |
I think that may be the case! I had no idea; this wasn't mentioned in the Out of curiosity, why would the server ID matter to properly isolating the port allocations? As far as I understand it, RTPEngine just receives forwarding allocation requests (in the form of offer / answer commands) and responds to them. As far as I know, it does not and has no need to comprehend the nature of the endpoint that is steering it. Is there a larger philosophy behind this? |
Also, does this mean that I should set different |
We discovered this ourselves only recently 😊
It doesn't as such, but with colliding server IDs and colliding process IDs and colliding sequence numbers, rtpengine ends up detecting messages as duplicates and will return the same SDP as for a different call.
Unique server ID values for each Kamailio instance. |
The word |
Depends on how reliable your logging is I guess. You can double check further if you see whether there's any overlap in process IDs between the Kamailio instances. |
The logging is very reliable, and there are no overlapping Kamailio PIDs on the two systems. Kamailio A PIDs are in the 31,000 range, while Kamailio B PIDs are in the 1,500 range. |
Definitely not the same situation then. The only other possible explanation is that the local port was re-used for a new call very quickly after being closed from a previous call, and a misbehaving remote RTP client that kept sending media to the old port after the call was closed. |
I've got a call on Kamailio A that used this local port which began at 20:18:13, while the delete command for the old call, which was managed by Kamailio B, did not show up until 20:18:34. That call began around 20:10:00. So, they were just running concurrently. This means the same port was recycled for a new call while the old one using it was still running. |
Do you have the SDPs to show this? Because it shouldn't be possible to open the same port twice. |
Well, the SDP offer & answer for the more recent call is included in the initial post for this issue. The SDP offer for the older, original call on Kamailio B is:
The SDP answer fed into RTPEngine is a JsSIP answer:
|
The output SDPs would be of interest, the ones that would show port 43562 being used. |
Oh, okay. For the older call, here is the output offer, which contains port 43562. This is coming from a FreeSWITCH host, being WebRTC-ified, and going out to a JsSIP endpoint.
And here's the output SDP answer for the more recent call, which is plain-UDP coming from some commercial SBC to FreeSWITCH, with RTPEngine in between the two. We are answering it and the answer offers port 43562:
|
and 4.4.4.4 is the same as xxx.xxx.xxx.xxx ? |
No, 4.4.4.4 is the public Internet side of RTPEngine ("external_eip" interface), while xxx.xxx.xxx.xxx is a private interface that has the designation "external" (not to be confused with yet another interface called "internal"). It had not occurred to me, but the full invocation could be relevant:
|
Just to confirm, in |
That's right, And, I hear you, but I think they are indeed opening the same port at the same time. At least, that's where all the info points. Would you not agree? |
That can't be right. The kernel module doesn't allow this either. It would report an error if the same local IP:port was added a second time. |
Well, I no longer have the RTP captures -- they were rotated out -- so I can't prove it. But the output SDP came straight from the captures. The offer toward the WebRTC endpoint was at ~20:10:00, while the offer toward the regular carrier endpoint was at ~20:18:13, and the delete command for the WebRTC call didn't come until 20:18:34. |
The immediate resolution to this problem has been to separate the RTPEngines used for Kamailio A and Kamailio B. We are watching for the same signs of port collisions, and will follow up and let you know. It's certainly possible that collisions occur for reasons other than offer/answer requests from different controlling Kamailio instances. |
There's no distinction between requests coming from different Kamailio instances so I'm not sure how that would make a difference. |
I'm not either, which makes me think this problem is tied up in the interface definition scheme here (aliased vs unaliased).—Sent from mobile, apologies for brevity and errors.On Mar 23, 2023, at 9:07 AM, Richard Fuchs ***@***.***> wrote:
There's no distinction between requests coming from different Kamailio instances so I'm not sure how that would make a difference.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
You were certainly correct that merely eliminating multiple Kamailio instances controlling the same RTPEngine instance did not solve the problem. But, so far, the examples we have seen of this problem all involve overlapping port assignment on calls where one call goes out the public I know you said that port collisions between these are impossible, but I'm not sure that's the case, given what we're seeing. We'll continue digging. |
Somehow this smells like a misconfiguration. Maybe double check which address the actual sockets are bound to (netstat/ss -anp) and maybe check if there's a stray config file in /etc. |
I explored the misconfiguration idea as best as I could, but can't see anything. There's no stray config files. |
Does it not strike you as odd that
|
We tried something new: we aliased the external AWS EIP to a different interface, and suddenly, no more port collisions:
I'm sorry, but I think you are mistaken about the separation of ports between the aliased |
No, that's just how the output is formatted (one output per logical interface). The port pools are actually the same. You can see that the number of "ports used" and the "last port used" is always the same. |
In
mr11.2.1.5
on kernel4.18.0-425.10.1.el8_7.x86_64
, on some calls, we are seeing hundreds of messages like this from rtpengine logs, essentially instantaneously:On this call, it was 400+.
I confirmed via wire protocol trace that there was only a single
offer
,answer
anddelete
command, so no re-invites. There's nothing unusual about the SDP, it's as banal as can be.Offer:
Answer:
I cannot find any correlation between any SDP attributes and this phenomenon occurring.
I also checked to make sure there were no duplicate
-j RTPENGINE
rules in theINPUT
chain; there are not.Would appreciate any insight into the cause of this!
The text was updated successfully, but these errors were encountered: