Possible issue with T.38 in a NAT'd environment (GCP) #1609

jmordica · 2023-02-08T22:34:45Z

We use rtpengine in GCP where asterisk (using spandsp) instances don't have public ip's assigned, but rtpengine.conf is setup like so:

interface=internal/10.206.0.11;external/10.206.0.11!33.177.188.61

Kamailio sets the proper internal external or external internal flags based on the direction of the call and everything except large content faxes work fine.

The GCP firewall rule includes all of rtpengine udp port range as well.

When I refer to large content faxes, I'm simply referring to a ~10 page fax where each page has a lot of content on each page.

I've narrowed the issue down to either something not happening correctly on the rtpengine side or GCP port/firewall mapping because I can successfully send the exact fax to the same environment where the only difference is rtpengine using a single interface and no NATting is happening.

I have full debug logs and pcaps of this occurring but would rather not just dump everything here if it is not needed. I'm just wondering if you have experienced something along these lines before and/or have some experience you can pass along here as to where to look in GCP/network/kamailio.

Thanks!

The text was updated successfully, but these errors were encountered:

carstenbock · 2023-02-09T09:09:35Z

Hi,

Please use the Mailing-List (https://rtpengine.com/mailing-list) for such questions.
Fax usually is rather sensitive to timing so you will need a precise timer on the device. If the device is a Patton device, for example, you need to look for a model with a "High precision timer" for Faxes to work properly for anything beyond 2-3 pages. AVM FritzBox devices (consumer DSL modems, popular in Germany), on the other hand, have occasionally issues when it comes to "large content" faxes.

Luckily I do mobile networks today, so I don't have to deal with Fax and T.38 anymore ;-)

Thanks,
Carsten

rfuchs · 2023-02-09T13:09:58Z

In addition to what @carstenbock has said, AFAIK T.38 has the property of sending packets in one direction only during the transmission phase, without any feedback packets being sent back. You can verify this with Wireshark/tcpdump. This in turn may lead to the NAT mapping of the UDP port being closed by the firewall after a certain time has passed, during which the firewall has not seen any packets in one (usually the "outgoing") direction. This is purely speculation though and you'd have to confirm that this behaviour is actually what your firewall/NAT gateway does.

If that's what happens, the only solution would be to have SpanDSP generate periodic keepalive packets that can be sent out to keep the NAT mapping alive. I'm not sure if such an option exists in SpanDSP.

jmordica · 2023-02-17T16:27:46Z

I did more testing. We are currently on mr11.1.1.3.

I was able to confirm by using perf that the outbound udp port/mapping is not closing due to some timeout or firewall. I can send incoming packets to the NAT'd rtpengine instance over the WAN without any responses for ~4 minutes and then successfully send a UDP packet from rtpengine back to the remote endpoint without an issue.

It seems there is an issue/bug in rtpengine when receiving a large incoming fax page and where rtpengine is in a NAT'd environment. Here are a couple of screenshots. The green colored screenshot is taken from an rtpengine instance only using a single interface in a non-nat'd environment:

The second screenshot colored red is taken from an rtpengine instance in a NAT'd environment where multiple interfaces internal/external interfaces are used while receiving a large incoming fax in passthrough to asterisk spandsp:

You can see from the above ^ the port remapping issue occurring where rtpengine either forgets or crosses them up later on throughout the communication.

Let me know if I need to try another version here or if you have any further advice.

Thanks!

rfuchs · 2023-02-17T17:40:24Z

There's no reason for rtpengine to suddenly change ports or interfaces unless there was some kind of signalling event in between, or it has received inbound packets that would indicate a change in ports. Your screenshot doesn't show where or when ports would have changed (and all packets before the ones highlighted by you are flowing the same way), so I can't tell when or why that would have happened. You need to look for the event that would have triggered the change in ports or interfaces. With debug logging enabled you would also see a log message emitted if ports or interfaces or destinations were ever changed.

There was one recent commit related to local interface switching, but without more details I can't tell if that has anything to do with what you're seeing. But you can try a newer version from the 11.1 or 11.2 branches which have that commit included, for example 11.1.1.7.

themrrobert · 2023-02-18T01:48:25Z

I already looked at the pcap for signalling or other packets coming in from that port to that interface that would explain the observed behavior, and found none, so I ruled those out.

I am double checking now, and for your peace of mind, the following filter shows no results: udp.dstport==4119 && udp.srcport != 12102
So that rules out incoming traffic changing the port. (The only traffic to port 4119 is from 10.206.0.49:12102 => 10.52.2.14:4119)

There is no SIP or rtpProxyNG signalling between 35s (when it's working fine) and 250s (when it's using the wrong port).

We'll try those branches + debugging enabled

themrrobert · 2023-02-18T01:57:30Z

Do you know which commit you're referring to? i'd like to take a look at it, thanks! :)

rfuchs · 2023-02-21T13:22:06Z

That would be 5c65690

jmordica · 2023-02-26T19:55:09Z

I have a recent example with the latest 11.2.1.4. Where can I send the pcap and debug logs to?

You will be able to see the port changing after the first page is received.

rfuchs · 2023-02-27T14:20:45Z

You can attach them here, or send to my email.

jmordica · 2023-02-27T17:21:44Z

Just sent via email.

Thanks!

jmordica · 2023-03-06T23:03:42Z

I have confirmed that the issue exists in the kernel module somewhere. I can disable the kernel module by setting --table=-1 flag and large faxes work fine. What can I do to help identify the cause? I can produce debug logs and pcaps?

Thanks.

rfuchs · 2023-03-07T13:19:37Z

First thing you should do is inspect the contents of /proc/rtpengine/0/list to see what the kernel's notion of the forwarding chain is. All ports involved are listed there. In the example from your screenshot above, you should see an entry corresponding to local 10.206.0.49:12102 and underneath that you would see the output IP/ports.

jmordica · 2023-03-10T17:46:48Z

I sent you a log of this command:

watch -n 1 -p "date >> /tmp/kmlog.txt; cat /proc/rtpengine/0/list >> /tmp/kmlog.txt"

along with the pcap to see if you can derive anything from that log output.

jmordica · 2023-03-13T19:12:11Z

Hey there any updates here? Or should we just used without the kernel module?

Thanks.

rfuchs · 2023-03-14T15:40:55Z

Right now this is not actionable as there's no indication that there's anything wrong with the code.

jmordica · 2023-03-15T02:06:56Z

How can I help debug further in order to identify the issue?

rfuchs · 2023-03-15T16:53:43Z

The only things I can think of is 1) recreate the exact same scenario outside of GCP (with NAT and all) and see if it does the same thing, and 2) add a debug output log line to the kernel module to confirm that the source port is set correctly, for example in send_proxy_packet4(), where *uh is set, something like printk(KERN_INFO, "source port %u\n", src->port); and then check dmesg for the output.

jmordica · 2023-03-15T17:08:11Z

I don’t assume there is anyway to conditionally use kernel module when setting up new offer is there?

it would be nice to just not use the kernel module for certain calls.

rfuchs · 2023-03-15T17:11:00Z

No we don't have such a feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible issue with T.38 in a NAT'd environment (GCP) #1609

Possible issue with T.38 in a NAT'd environment (GCP) #1609

jmordica commented Feb 8, 2023 •

edited

Loading

carstenbock commented Feb 9, 2023

rfuchs commented Feb 9, 2023

jmordica commented Feb 17, 2023

rfuchs commented Feb 17, 2023 •

edited

Loading

themrrobert commented Feb 18, 2023 •

edited

Loading

themrrobert commented Feb 18, 2023

rfuchs commented Feb 21, 2023

jmordica commented Feb 26, 2023

rfuchs commented Feb 27, 2023

jmordica commented Feb 27, 2023

jmordica commented Mar 6, 2023

rfuchs commented Mar 7, 2023

jmordica commented Mar 10, 2023

jmordica commented Mar 13, 2023

rfuchs commented Mar 14, 2023

jmordica commented Mar 15, 2023

rfuchs commented Mar 15, 2023

jmordica commented Mar 15, 2023

rfuchs commented Mar 15, 2023

Possible issue with T.38 in a NAT'd environment (GCP) #1609

Possible issue with T.38 in a NAT'd environment (GCP) #1609

Comments

jmordica commented Feb 8, 2023 • edited Loading

carstenbock commented Feb 9, 2023

rfuchs commented Feb 9, 2023

jmordica commented Feb 17, 2023

rfuchs commented Feb 17, 2023 • edited Loading

themrrobert commented Feb 18, 2023 • edited Loading

themrrobert commented Feb 18, 2023

rfuchs commented Feb 21, 2023

jmordica commented Feb 26, 2023

rfuchs commented Feb 27, 2023

jmordica commented Feb 27, 2023

jmordica commented Mar 6, 2023

rfuchs commented Mar 7, 2023

jmordica commented Mar 10, 2023

jmordica commented Mar 13, 2023

rfuchs commented Mar 14, 2023

jmordica commented Mar 15, 2023

rfuchs commented Mar 15, 2023

jmordica commented Mar 15, 2023

rfuchs commented Mar 15, 2023

jmordica commented Feb 8, 2023 •

edited

Loading

rfuchs commented Feb 17, 2023 •

edited

Loading

themrrobert commented Feb 18, 2023 •

edited

Loading