Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-stop segfaults when we switch to a non-local #124

Open
elbow opened this issue Apr 20, 2020 · 12 comments
Open

Non-stop segfaults when we switch to a non-local #124

elbow opened this issue Apr 20, 2020 · 12 comments

Comments

@elbow
Copy link

elbow commented Apr 20, 2020

We are using a request-handler with our drachtio.

    <request-handlers>
        <request-handler sip-method="*" http-method="POST">http://10.95.16.234:8080/sip4.xxx.com</request-handler>
    </request-handlers>

We are trying our k8s and switched this to another instance that is about 25 milliseconds away.

In every other respect the service returns the same response - we tested with contructing requests with curl, eg:

curl -X POST -H 'Content-Type: text/plain' -d "`cat /tmp/packet`"  'http://10.151.192.13:8080/sip4.xxx.com/?method=REGISTER&domain=rtc.xxx.com&protocol=tcp&source_address=10.0.98.192&fromUser=12345&toUser=12345&uriUser=&contentType=&uri=sip:rtc.xxx.com

When we switch over then drachtio segfaults on the first packet it tries to process (or maybe first register?)

2020-04-20 20:31:38.340011 Starting drachtio version v0.8.4-9-g608a588
2020-04-20 20:31:38.340176 Logging threshold:                     4
2020-04-20 20:31:38.340252 Route for outbound connection:         sip-method: *, http-method: POST, http-url: http://10.151.192.13:8080/sip4.fibrephone.telviva.com
2020-04-20 20:31:38.340297 DrachtioController::run tls key file:         /etc/letsencrypt/live/sip.fibrephone.telviva.com/privkey.pem
2020-04-20 20:31:38.340342 DrachtioController::run tls certificate file: /etc/letsencrypt/live/sip.fibrephone.telviva.com/cert.pem
2020-04-20 20:31:38.340374 DrachtioController::run tls chain file:       /etc/letsencrypt/live/sip.fibrephone.telviva.com/fullchain.pem
2020-04-20 20:31:38.340404 DrachtioController::run listening for applications on tcp port 9022 and tls port 0
2020-04-20 20:31:38.341065 DrachtioController::run mtu size for udp packets: 4096
2020-04-20 20:31:38.341130 DrachtioController::run - sipcapture/Homer enabled: udp:10.155.250.80:9060;hep=3;capture_id=104
2020-04-20 20:31:38.341163 Prometheus support disabled
2020-04-20 20:31:38.341194 tcp keep alives will be sent to clients every 45 seconds
2020-04-20 20:31:38.341234 ClientController::threadFunc - ClientController: io_context run loop started (or restarted)
2020-04-20 20:31:38.341334 DrachtioController::run: starting sip stack on sip:*:6060;transport=udp,tcp
2020-04-20 20:31:38.344682 SipTransport::addTransports - creating transport: 0x16a7420: udp/197.155.250.159:6060
2020-04-20 20:31:38.346675 SipTransport::addTransports - creating transport: 0x16a78d0: tcp/197.155.250.159:6060
2020-04-20 20:31:38.348680 SipTransport::addTransports - creating transport: 0x16a7db0: udp/10.155.250.159:6060
2020-04-20 20:31:38.350634 SipTransport::addTransports - creating transport: 0x16a8290: tcp/10.155.250.159:6060
2020-04-20 20:31:38.352587 SipTransport::addTransports - creating transport: 0x16b6720: udp/172.17.0.1:6060
2020-04-20 20:31:38.354563 SipTransport::addTransports - creating transport: 0x16b6bd0: tcp/172.17.0.1:6060
2020-04-20 20:31:38.356511 SipTransport::addTransports - creating transport: 0x16b7080: udp/127.0.0.1:6060
2020-04-20 20:31:38.358502 SipTransport::addTransports - creating transport: 0x16b7530: tcp/127.0.0.1:6060
2020-04-20 20:31:38.360498 SipTransport::addTransports - creating transport: 0x16b79e0: udp/[::1]:6060
2020-04-20 20:31:38.362499 SipTransport::addTransports - creating transport: 0x16b7e90: tcp/[::1]:6060
2020-04-20 20:31:38.364522 DrachtioController::run: adding additional sip address sips:*:6061;transport=tls
2020-04-20 20:31:38.365539 tport_tls.c:439 tls_init_context() tls: initialized ECDH
2020-04-20 20:31:38.366256 tport_tls.c:439 tls_init_context() tls: initialized ECDH
2020-04-20 20:31:38.366917 tport_tls.c:439 tls_init_context() tls: initialized ECDH
2020-04-20 20:31:38.367570 tport_tls.c:439 tls_init_context() tls: initialized ECDH
2020-04-20 20:31:38.368218 tport_tls.c:439 tls_init_context() tls: initialized ECDH
2020-04-20 20:31:38.370549 SipTransport::addTransports - creating transport: 0x16bd0a0: tls/197.155.250.159:6061
2020-04-20 20:31:38.372523 SipTransport::addTransports - creating transport: 0x16c0570: tls/10.155.250.159:6061
2020-04-20 20:31:38.374478 SipTransport::addTransports - creating transport: 0x16c8790: tls/172.17.0.1:6061
2020-04-20 20:31:38.376463 SipTransport::addTransports - creating transport: 0x16d0910: tls/127.0.0.1:6061
2020-04-20 20:31:38.378457 SipTransport::addTransports - creating transport: 0x16d8670: tls/[::1]:6061
2020-04-20 20:31:38.380453 DrachtioController::run: adding additional sip address sips:197.155.250.159:4433;transport=wss
2020-04-20 20:31:38.383072 SipTransport::addTransports - creating transport: 0x16e5d80: wss/197.155.250.159:4433
2020-04-20 20:31:38.385074 SipTransport::logTransports - there are : 16 transports
2020-04-20 20:31:38.385128 SipTransport::logTransports - wss/197.155.250.159:4433 (sips:197.155.250.159:4433;transport=wss, external-ip: , local-net: )
2020-04-20 20:31:38.385163 SipTransport::logTransports - tls/127.0.0.1:6061 (sips:*:6061;transport=tls, external-ip: , local-net: 127.0.0.1/32)
2020-04-20 20:31:38.385194 SipTransport::logTransports - tls/172.17.0.1:6061 (sips:*:6061;transport=tls, external-ip: , local-net: )
2020-04-20 20:31:38.385226 SipTransport::logTransports - tls/10.155.250.159:6061 (sips:*:6061;transport=tls, external-ip: , local-net: 10.0.0.0/8)
2020-04-20 20:31:38.385257 SipTransport::logTransports - tls/197.155.250.159:6061 (sips:*:6061;transport=tls, external-ip: , local-net: )
2020-04-20 20:31:38.385288 SipTransport::logTransports - tls/[::1]:6061 (sips:*:6061;transport=tls, external-ip: , local-net: )
2020-04-20 20:31:38.385322 SipTransport::logTransports - udp/197.155.250.159:6060 (sip:*:6060;transport=udp,tcp, external-ip: , local-net: ), mtu size: 4096
2020-04-20 20:31:38.385356 SipTransport::logTransports - udp/10.155.250.159:6060 (sip:*:6060;transport=udp,tcp, external-ip: , local-net: 10.0.0.0/8), mtu size: 4096
2020-04-20 20:31:38.385388 SipTransport::logTransports - tcp/127.0.0.1:6060 (sip:*:6060;transport=udp,tcp, external-ip: , local-net: 127.0.0.1/32)
2020-04-20 20:31:38.385419 SipTransport::logTransports - tcp/197.155.250.159:6060 (sip:*:6060;transport=udp,tcp, external-ip: , local-net: )
2020-04-20 20:31:38.385450 SipTransport::logTransports - tcp/10.155.250.159:6060 (sip:*:6060;transport=udp,tcp, external-ip: , local-net: 10.0.0.0/8)
2020-04-20 20:31:38.385482 SipTransport::logTransports - udp/172.17.0.1:6060 (sip:*:6060;transport=udp,tcp, external-ip: , local-net: ), mtu size: 4096
2020-04-20 20:31:38.385514 SipTransport::logTransports - tcp/172.17.0.1:6060 (sip:*:6060;transport=udp,tcp, external-ip: , local-net: )
2020-04-20 20:31:38.385546 SipTransport::logTransports - udp/127.0.0.1:6060 (sip:*:6060;transport=udp,tcp, external-ip: , local-net: 127.0.0.1/32), mtu size: 4096
2020-04-20 20:31:38.385580 SipTransport::logTransports - udp/[::1]:6060 (sip:*:6060;transport=udp,tcp, external-ip: , local-net: ), mtu size: 4096
2020-04-20 20:31:38.385611 SipTransport::logTransports - tcp/[::1]:6060 (sip:*:6060;transport=udp,tcp, external-ip: , local-net: )
2020-04-20 20:31:38.385746 Starting sofia event loop in main thread: 139745329432384
2020-04-20 20:31:38.385807 tport_type_udp.c:519 tport_udp_error() tport_udp_error: Connection refused (111) [icmp type=3 code=3]
2020-04-20 20:31:38.385841 tport_type_udp.c:524 tport_udp_error()       reported by [127.0.0.1]:0
2020-04-20 20:31:38.385876 nta.c:2878 agent_tp_error() nta_agent: tport: 127.0.0.1:6060: Connection refused
2020-04-20 20:31:38.588883 recv 654 bytes from wss/[105.233.226.138]:49220 at 20:31:38.588281:
REGISTER sip:rtc.telviva.com SIP/2.0
Via: SIP/2.0/WSS 71eppljkhela.invalid;branch=z9hG4bK1157744
Max-Forwards: 69
To: <sip:[email protected]>
From: "323" <sip:[email protected]>;tag=d9pt58vst6
Call-ID: iodlpr6jmo75bseg11v5nq
CSeq: 229 REGISTER
X-PushId-Platform: web
X-PushId: ca879a05-899f-4dd7-9770-89d34e5b3bb9
Contact: <sip:[email protected];transport=ws>;+sip.ice;reg-id=1;+sip.instance="<urn:uuid:c73aac43-e6e5-4c23-b661-9db463c11628>";expires=600
Expires: 600
Allow: INVITE,ACK,CANCEL,BYE,UPDATE,MESSAGE,OPTIONS,REFER,INFO
Supported: path,gruu,outbound
User-Agent: TelvivaOne web 1.0 000
Content-Length: 0

2020-04-20 20:31:38.591674 RequestHandler::startRequest: sending http POST: http://10.151.192.13:8080/sip4.fibrephone.telviva.com/?method=REGISTER&domain=rtc.telviva.com&protocol=tcp&source_address=105.233.226.138&fromUser=6598323&toUser=6598323&uriUser=&contentType=&uri=sip%3artc.telviva.com
2020-04-20 20:31:38.654289 http 200 response received from server in 0.0611 secs: {"action":"route","data":{"tag":"rtcproxy"}}
2020-04-20 20:31:45.334359 Starting drachtio version v0.8.4-9-g608a588

and so on and so forth.

The coredump says that the crash is like so:

(gdb) bt
#0  0x000000000059fe1d in drachtio::SipTransport::isInNetwork (this=0xc0, address=0x7fffb31e3330 "sip.telviva.com") at ../src/sip-transports.cpp:123
#1  0x00000000005a2660 in drachtio::SipTransport::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)>::operator()(const std::shared_ptr<drachtio::SipTransport> &, const std::shared_ptr<drachtio::SipTransport> &) const (__closure=0x7fffb31e3320, pA=std::shared_ptr (count 2, weak 1) 0x16bc110, pB=<error reading variable: Cannot access memory at address 0x128>)
    at ../src/sip-transports.cpp:466
#2  0x00000000005a6246 in __gnu_cxx::__ops::_Val_comp_iter<drachtio::SipTransport::findAppropriateTransport(char const*, char const*)::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)> >::operator()<std::shared_ptr<drachtio::SipTransport>, __gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport> > > >(std::shared_ptr<drachtio::SipTransport> &, __gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport>, std::allocator<std::shared_ptr<drachtio::SipTransport> > > >) (
    this=0x7fffb31e3320, __val=std::shared_ptr (count 2, weak 1) 0x16bc110, __it=<error reading variable: Cannot access memory at address 0x128>) at /usr/include/c++/5/bits/predefined_ops.h:169
#3  0x00000000005a5e12 in std::__unguarded_linear_insert<__gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport> > >, __gnu_cxx::__ops::_Val_comp_iter<drachtio::SipTransport::findAppropriateTransport(char const*, char const*)::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)> > >(__gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport>, std::allocator<std::shared_ptr<drachtio::SipTransport> > > >, __gnu_cxx::__ops::_Val_comp_iter<drachtio::SipTransport::findAppropriateTransport(char const*, char const*)::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)> >) (__last=std::shared_ptr (empty) 0x0, __comp=...)
    at /usr/include/c++/5/bits/stl_algo.h:1823
#4  0x00000000005a576a in std::__insertion_sort<__gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport> > >, __gnu_cxx::__ops::_Iter_comp_iter<drachtio::SipTransport::findAppropriateTransport(char const*, char const*)::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)> > >(__gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport>, std::allocator<std::shared_ptr<drachtio::SipTransport> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport>, std::allocator<std::shared_ptr<drachtio::SipTransport> > > >, __gnu_cxx::__ops::_Iter_comp_iter<drachtio::SipTransport::findAppropriateTransport(char const*, char const*)::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)> >) (__first=std::shared_ptr (empty) 0x0, __last=std::shared_ptr (count 1, weak 1) 0x16bb360, __comp=...)
    at /usr/include/c++/5/bits/stl_algo.h:1850
#5  0x00000000005a4ff2 in std::__final_insertion_sort<__gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport> > >, __gnu_cxx::__ops::_Iter_comp_iter<drachtio::SipTransport::findAppropriateTransport(char const*, char const*)::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)> > >(__gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport>, std::allocator<std::shared_ptr<drachtio::SipTransport> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport>, std::allocator<std::shared_ptr<drachtio::SipTransport> > > >, __gnu_cxx::__ops::_Iter_comp_iter<drachtio::SipTransport::findAppropriateTransport(char const*, char const*)::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)> >) (__first=std::shared_ptr (empty) 0x0, __last=std::shared_ptr (count 1, weak 1) 0x16bb360, __comp=...)
    at /usr/include/c++/5/bits/stl_algo.h:1885
#6  0x00000000005a4b5b in std::__sort<__gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport> > >, __gnu_cxx::__ops::_Iter_comp_iter<drachtio::SipTransport::findAppropriateTransport(char const*, char const*)::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)> > >(__gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport>, std::allocator<std::shared_ptr<drachtio::SipTransport> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport>, std::allocator<std::shared_ptr<drachtio::SipTransport> > > >, __gnu_cxx::__ops::_Iter_comp_iter<drachtio::SipTransport::findAppropriateTransport(char const*, char const*)::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)> >) (__first=std::shared_ptr (empty) 0x0, __last=std::shared_ptr (count 1, weak 1) 0x16bb360, __comp=...)
    at /usr/include/c++/5/bits/stl_algo.h:1966
#7  0x00000000005a463a in std::sort<__gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport> > >, drachtio::SipTransport::findAppropriateTransport(char const*, char const*)::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)> >(__gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport>, std::allocator<std::shared_ptr<drachtio::SipTransport> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<drachtio::SipTransport>*, std::vector<std::shared_ptr<drachtio::SipTransport>, std::allocator<std::shared_ptr<drachtio::SipTransport> > > >, drachtio::SipTransport::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)>) (__first=std::shared_ptr (empty) 0x0,
    __last=std::shared_ptr (count 1, weak 1) 0x16bb360, __comp=...) at /usr/include/c++/5/bits/stl_algo.h:4729
#8  0x00000000005a3012 in drachtio::SipTransport::findAppropriateTransport (remoteHost=0x179fa40 "sip:sip.telviva.com", proto=0x741266 "udp") at ../src/sip-transports.cpp:482
#9  0x0000000000538ebe in drachtio::SipDialogController::doSendRequestOutsideDialog (this=0x16e6100, pData=0x7f18f0006ea8) at ../src/sip-dialog-controller.cpp:456
#10 0x0000000000533537 in (anonymous namespace)::cloneSendSipRequest (p=0x16a57e0, msg=0x7fffb31e40e0, arg=0x7f18f0006ea8) at ../src/sip-dialog-controller.cpp:51
#11 0x0000000000675aef in su_base_port_execute_msgs ()
#12 0x0000000000675828 in su_base_port_getmsgs ()
#13 0x0000000000675c07 in su_base_port_run ()
#14 0x0000000000626758 in su_port_run ()
#15 0x00000000006278a6 in su_root_run ()
#16 0x0000000000425417 in drachtio::DrachtioController::run (this=0x16a57e0) at ../src/controller.cpp:1160
#17 0x0000000000413fc9 in main (argc=4, argv=0x7fffb31e4c98) at ../src/main.cpp:47

digging into the stack frames:

(gdb) frame 0
#0  0x000000000059fe1d in drachtio::SipTransport::isInNetwork (this=0xc0, address=0x7fffb31e3330 "sip.telviva.com") at ../src/sip-transports.cpp:123
123	    if (!m_netmask) return false;
(gdb) p m_netmask
Cannot access memory at address 0x150
(gdb) p this
$1 = (const drachtio::SipTransport * const) 0xc0
(gdb) p *this
Cannot access memory at address 0xc0
(gdb) frame 1
#1  0x00000000005a2660 in drachtio::SipTransport::<lambda(const std::shared_ptr<drachtio::SipTransport>&, const std::shared_ptr<drachtio::SipTransport>&)>::operator()(const std::shared_ptr<drachtio::SipTransport> &, const std::shared_ptr<drachtio::SipTransport> &) const (__closure=0x7fffb31e3320, pA=std::shared_ptr (count 2, weak 1) 0x16bc110, pB=<error reading variable: Cannot access memory at address 0x128>)
    at ../src/sip-transports.cpp:466
466	      if (pB->isInNetwork(host.c_str())) {
(gdb) p pB
$2 = <error reading variable: Cannot access memory at address 0x128>
(gdb) p *pB
Could not find operator*.
(gdb) p host
$3 = "sip.telviva.com"
@davehorton
Copy link
Collaborator

this possibly looks to be a race condition where you get an incoming REGISTER before everything is initialized. Are you able to recreate the crash outside of that condition?

@davehorton
Copy link
Collaborator

also, can you provide a log at debug level for both drachtio server and sofia

@elbow
Copy link
Author

elbow commented Apr 20, 2020

Traffic comes thick and fast - there are quite a few thousand clients. So if drachtio crashes at a busy time you can be sure that as soon as it reads incoming traffic it will get a packet.

It is curious that it doesn't happen with the local request handler - every time it gets into this cycle of crashes we can change the request handler back to the local one and it comes up no problem.

I'll provide the extra logging etc but bear with me, we need to do some roll-back since we can't complete the move to the new k8s cluster.

Thanks,
Steve

@elbow
Copy link
Author

elbow commented Apr 20, 2020

For completeness, in another example it didn't crash immediately - responded to 55 registers and then crashed on the next

2020-04-20 05:59:16.885322 Starting drachtio version v0.8.4-2-gb9cf802
...startup logs...
2020-04-20 05:59:16.934375 Starting sofia event loop in main thread: 140495091922752
2020-04-20 05:59:16.934447 tport_type_udp.c:519 tport_udp_error() tport_udp_error: Connection refused (111) [icmp type=3 code=3]
2020-04-20 05:59:16.934488 tport_type_udp.c:524 tport_udp_error()       reported by [127.0.0.1]:0
2020-04-20 05:59:16.934530 nta.c:2867 agent_tp_error() nta_agent: tport: 127.0.0.1:6060: Connection refused
2020-04-20 05:59:17.073057 recv 656 bytes from wss/[105.242.164.9]:49280 at 05:59:17.072802:
REGISTER sip:rtc.telviva.com SIP/2.0
Via: SIP/2.0/WSS totdd2i7688f.invalid;branch=z9hG4bK5669083
Max-Forwards: 69
To: <sip:[email protected]>
From: "2307" <sip:[email protected]>;tag=bb5k9uq447
Call-ID: h09vii4mg3p9761eujlsu5
CSeq: 1158 REGISTER
X-PushId-Platform: web
X-PushId: 0d386491-360b-4589-a228-a5af7261f7af
Contact: <sip:[email protected];transport=ws>;+sip.ice;reg-id=1;+sip.instance="<urn:uuid:4fa5d347-ef52-4ef6-ad15-15a6dd94c044>";expires=600
Expires: 600
Allow: INVITE,ACK,CANCEL,BYE,UPDATE,MESSAGE,OPTIONS,REFER,INFO
Supported: path,gruu,outbound
User-Agent: TelvivaOne web 1.0 000
Content-Length: 0

2020-04-20 05:59:17.074767 RequestHandler::startRequest: sending http POST: http://10.151.192.13:8080/sip4.fibrephone.telviva.com/?method=REGISTER&domain=rtc.telviva.com&protocol=tcp&source_address=105.242.164.9&fromUser=7642307&toUser=7642307&uriUser=&contentType=&uri=sip%3artc.telviva.com
2020-04-20 05:59:17.159073 http 200 response received from server in 0.0828 secs: {"action":"route","data":{"tag":"rtcproxy"}}
2020-04-20 05:59:17.159359 No connected clients found to handle incoming register request
2020-04-20 05:59:17.159464 ClientController::selectClientForTag - no clients registered for tag: rtcproxy
2020-04-20 05:59:17.221003 SipDialogController::addIncomingRequestTransaction - adding transactionId bb6fab1a-1148-40be-bc36-ba878d2c6ffd for irq:0x1a8a7d0
2020-04-20 05:59:17.222293 send 320 bytes to wss/[105.242.164.9]:49280 at 05:59:17.221953:
SIP/2.0 480 Temporarily Unavailable
Via: SIP/2.0/WSS totdd2i7688f.invalid;branch=z9hG4bK5669083;received=105.242.164.9;rport=49280
From: "2307" <sip:[email protected]>;tag=bb5k9uq447
To: <sip:[email protected]>;tag=4jaFXUp90008j
Call-ID: h09vii4mg3p9761eujlsu5
CSeq: 1158 REGISTER
Content-Length: 0

54 more registers arrive and are responded to

then:

2020-04-20 05:59:29.406623 recv 655 bytes from wss/[129.205.174.139]:56818 at 05:59:29.406433:
REGISTER sip:rtc.telviva.com SIP/2.0
Via: SIP/2.0/WSS 0puo2tutr873.invalid;branch=z9hG4bK1066410
Max-Forwards: 69
To: <sip:[email protected]>
From: "115" <sip:[email protected]>;tag=4jktn6br06
Call-ID: pglmhgrc6lp4n62a82dl7o
CSeq: 1086 REGISTER
X-PushId-Platform: web
X-PushId: d2f4b2e5-a4b6-4f33-bca0-1bebae00fb9b
Contact: <sip:[email protected];transport=ws>;+sip.ice;reg-id=1;+sip.instance="<urn:uuid:215f487d-34d6-41a7-a53f-59abe2e36dea>";expires=600
Expires: 600
Allow: INVITE,ACK,CANCEL,BYE,UPDATE,MESSAGE,OPTIONS,REFER,INFO
Supported: path,gruu,outbound
User-Agent: XxxOne web 1.0 000
Content-Length: 0

2020-04-20 05:59:29.406967 RequestHandler::startRequest: sending http POST: http://10.151.192.13:8080/sip4.fibrephone.xxx.com/?method=REGISTER&domain=rtc.xxx.com&protocol=tcp&source_address=129.205.174.139&fromUser
=9240115&toUser=9240115&uriUser=&contentType=&uri=sip%3artc.xxx.com
2020-04-20 05:59:29.447798 http 200 response received from server in 0.0395 secs: {"action":"route","data":{"tag":"rtcproxy"}}
<<<CRASH>>>
2020-04-20 05:59:29.845084 Starting drachtio version v0.8.4-2-gb9cf802

@elbow
Copy link
Author

elbow commented Apr 20, 2020

My colleague reports:

I applied the iptables rules but and waited 5 minutes. By that time rtcproxy had established a connection. As soon as I removed them I got crashes

So from that report it can't be a race during initialisation? - drachtio been running a long time and the client inbound connection all setup. Still crashes.

@davehorton
Copy link
Collaborator

agree, I'll wait for the logs ...

@davehorton
Copy link
Collaborator

any luck in gathering logs?

@elbow
Copy link
Author

elbow commented Apr 21, 2020

Hi Dave,

Because it is disruptive to the service we have to wait until our late evening to do the testing again.

That is in say 6 hours time.

@elbow
Copy link
Author

elbow commented Apr 21, 2020

@davehorton You won't be happy with this update. However:

If we run drachtio-server against our remote request-handler with normal logging we get that segfault very often:

Tue 2020-04-21 18:33:04 SAST  15145     0     1  11 * /usr/local/bin/drachtio
Tue 2020-04-21 18:33:05 SAST  15155     0     1  11 * /usr/local/bin/drachtio
Tue 2020-04-21 18:33:09 SAST  15166     0     1  11 * /usr/local/bin/drachtio
Tue 2020-04-21 18:33:33 SAST  15198     0     1  11 * /usr/local/bin/drachtio
Tue 2020-04-21 18:33:33 SAST  15184     0     1  11 * /usr/local/bin/drachtio
Tue 2020-04-21 18:33:35 SAST  15208     0     1  11 * /usr/local/bin/drachtio
Tue 2020-04-21 18:33:52 SAST  15244     0     1  11 * /usr/local/bin/drachtio
Tue 2020-04-21 18:33:52 SAST  15221     0     1  11 * /usr/local/bin/drachtio
Tue 2020-04-21 18:33:53 SAST  15254     0     1  11 * /usr/local/bin/drachtio
Tue 2020-04-21 18:34:21 SAST  15265     0     1  11 * /usr/local/bin/drachtio
Tue 2020-04-21 18:34:21 SAST  15286     0     1  11 * /usr/local/bin/drachtio
Tue 2020-04-21 18:34:23 SAST  15297     0     1  11 * /usr/local/bin/drachtio

If we push the sofia logging up to 9 and switch drachtio logging to debug:

        <!-- sofia (internal sip library) log level, from 0 (minimal) to 9 (verbose) -->
        <sofia-loglevel>9</sofia-loglevel>

        <!-- general process log level: notice, warning, error, info, debug.  Default: info -->
        <loglevel>debug</loglevel>

Then we don't get any crashes.

All I can guess is that the time taken to do the logging is enough to slow something down so that the race is avoided.

So we can't give you debug level logs of the crash since there is no crash if the logging is set to debug level.

We restarted Drachtio a bunch of times in case it was intermittent and cou;dn't get any crash to happen.

How do you want us to proceed?

Thanks,
Steve

@davehorton
Copy link
Collaborator

could you try with drachtio at debug and sofia at log level 3?

@davehorton
Copy link
Collaborator

also, your logs are showing an error right after startup, before any calls:

2020-04-20 20:31:38.385746 Starting sofia event loop in main thread: 139745329432384
2020-04-20 20:31:38.385807 tport_type_udp.c:519 tport_udp_error() tport_udp_error: Connection refused (111) [icmp type=3 code=3]
2020-04-20 20:31:38.385841 tport_type_udp.c:524 tport_udp_error()       reported by [127.0.0.1]:0

I'd like to get to the bottom of this as well. Can you just show me a log with debug level (drachtio and sofia) after startup? In that log it should not be necessary to receive any calls

@elbow
Copy link
Author

elbow commented Apr 21, 2020

drachtio.log

This is from startup for the first few seconds. As it start a bunch of packets arrive and I cut the log after the first few replies go back.

My drachtio config binds to "*" like so, which I guess is where the loopback address come from:

       <contacts>
                <contact>sip:*:6060;transport=udp,tcp</contact>
                <contact>sips:*:6061;transport=tls</contact>
                <contact>sips:111.222.159:4433;transport=wss</contact>
        </contacts>

Steve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants