GXP2135 Delay Dialling


#1

Hello,

I have a site with about 30 GXP2135 phones and handles 560 around calls per day.
Recently I’ve been having a lot of complaints (and have seen the problem myself) of random phones intermittently not “dialling” a call.

If they press a BLF or dial a number the phone sits there and doesn’t send the INVITE to the phone system (as shown by a packet capture on the phone). Eventually it may send it but more often than not it claims the call timed out before the INVITE leaves the phone.

The phones connect to the phone system using OpenVPN and then use the internal 10.0.1.X IP for the SIP registration and proxy server.
The phones have NAT Traversal set to “5” for VPN.

The VPN does not drop during the problem, pings remain constant and stable during the problem.
Inbound calls to the phone are not affected. Only outbound is an issue.

I’m on the latest firmware of 1.0.11.64.

I am usually great on puzzles like this, but this time I’m totally stuck. I can’t figure out what the cause might be. Any help would be happily received.


#2

Go with GS helpdesk, there is no way we can check phone problem.
You can try ssh to unit and check usage, maybe there is memory leakage or something.


#3

Registration for support “typically 5-8 business days”… I was hoping for faster help than that.

I’ve SSHd to the phones but do not know what I’d be looking for or what might be normal vs. abnormal.
Any advice via the forum?
Thanks


#4

I only use Lan-to-Lan VPN in ipsec (Draytek), and I’ve never seen this problem


#5

Sadly some of the phones are in residential settings where I don’t control the router, so the phone needs to run the VPN.
I’m not sure if the issue is the VPN anyway. The way it acts it is more like the phone software, the VPN is up and I can ping the phone, access the web interface etc. The packet capture just shows the phone delays sending the INVITE to start the call.


#6

what voip server are you using?


#7

on a couple of phones you can try the new beta …71, if i’m not mistaken it talks about improvements on VPN


#8

Thanks! I’ll look at the firmware.

I’m using Asterisk.

Though I don’t think that is the issue either.

The INVITE never leaves the phone… but while the phone is sitting not sending the INVITE a keepalive OPTIONS will (or can) get sent from the phone system to the phone and gets responded to.
So again, communication seems fine… just the phone is sitting on the INVITE like its doing a DNS query, writing a file and waiting, or some other “blocking” activity until it then finally sends.

Edit: I wish they wrote more in the changelog about the changes. The various VPN changes make me hopeful that one of the changes improves my situation. I’ll report back after a few days.


#9

With UCM630X and Remote Connect you would not have all these problems, with no NAT to do and no need for any VPN


#10

Thanks, if that can be popped in the post for free I’m all good. Asterisk allows near infinite customisation and has extensive API / AGI scripts in the setup.
Other non-Grandstream phones don’t exhibit the problem so I’m not entirely sure changing the phone system to suit a Grandstream only phone issue is the right solution.

In “good” news… the latest beta seems to have made the problem worse, which means debugging is going to be easier for me as it is now less intermittent and almost always happening :slight_smile:


#11

UCM is customized Asterisk, it certainly offers better security protection, additional services, and Remote Connect that allows you to remote IP phones without nat, without having to use VPN with a significant saving in costs, the free Remote connect license still allows a minimum of use, otherwise there are paid licenses, with costs that are not comparable to VPNs or anything else.
They are choices.

n.b.: honestly everywhere I see problems with OpenVPN, probably in the simple exchange of data it is valid, but here we are talking about the SIP protocol, quite another thing.


#12

I spent an entire day on site and eventually figured it out.
The issue existed with OpenVPN or going direct to the cloud asterisk phone server. So ruled out OpenVPN as the problem.

The phone would log (and tcpdump) the UDP INVITE packet leaving, the server would never receive it. The phone would keep sending INVITES as it didn’t get any response… When the server did eventually receive the INVITE packet and start the call, it wouldn’t be the retransmitted INVITEs … it would be the original one from ~30 seconds earlier!

The interesting part of the test, several hours in, was setting TCP SIP signalling instead of UDP.
The problems went away, but TCP SIP seems slow in other ways (takes ages to populate the BLF updates for example). I think it can’t handle a lot of commands in quick succession… TCP SIP might be ok for a single handset with no BLF and very few calls.

What did solve the issue was changing OpenVPN to TCP. And having SIP signalling on the normal UDP through OpenVPN.

To me it seems that the Grandstreams have an intermittent bug where UDP data can go missing or get stuck within the phone. Sometimes other data (SYSLOG) leaving the phone can “push” the stuck packet and make the problem less likely to happen.
The problem occurs even if you are going direct to a phone system “on the internet” or via a UDP OpenVPN to the server.
Changing data leaving the phone into TCP (even UDP SIP via a TCP OpenVPN) appears to solve the issue.

This issue never appears to affect setups where the phone server is within the same subnet/LAN.

Hope this finding helps someone else!


#13

There are a number of factors not expressed in the observations-

  1. There was no mention of how OpenVPN was setup. This is to say was it a site-to-site connection, was it using the phone VPN client to a server? I am assuming it was a site-to-site given that you indicated that you could see the Invite leave the phone with a capture. Otherwise, if using the phone as a VPN client, the Invite would have been encrypted and in the tunnel when it left the phone NIC.
  2. What are the connection speeds between the location where the phones are and that of the PBX in both directions?
  3. What are the tunnel speeds that the VPN server can accommodate?
    4, If you have a record of the Invite, then why did not the server receive it? You indicated that the server would never receive it, but apparently the server “did eventually receive” and the server was responding to the original Invite.

The tunnel transport you ultimately decided upon is TCP, but the traffic within the tunnel is still TCP or UDP. I am not following how or why there is a belief that the phone may have a bug where data gets stuck or goes missing, but this may have to do with if the phone is the VPN client or not. If not, then the phone is not aware of the VPN and would have no awareness of a LAN to the extent that it would decide to exhibit the issue such as in your scenario versus the LAN that both it and the server might both be connected to at the same time where it does not exhibit the problem.

I do not use the OpenVPN function of the phone, but more so a site-to-site IPsec VPN. It has been awhile since I even look at the GS implementation, but earlier on it was nothing but a bag of worms using a fairly old implementation that many had issues getting to work.


#14

The phone is the VPN client. But as mentioned, the problem still exists with the phone system made open to the internet and the VPN de-configured so data goes direct. It is indeed annoying you get the OpenVPN traffic if capturing on the phone while it is a client. Makes it difficult figuring out what it is doing, however you can get an idea of what the phone thinks it is doing by turning on SIP trace in syslog.

The same issue occurs in a residential setup with different equipment to the call center I was troubleshooting in. (So it isn’t the router or LAN).

The server is in a datacenter, the clients are on 80/20mbps VDSL. No performance issues between the two - that part of the setup RTT is very well graphed and monitored.

I’m unable to test the OpenVPN speed… as the phone is the client. I get expected line speeds when testing with OpenVPN connection from a computer.

Part of me wonders if it’s the number of BLF and how “busy” the phone gets. I’ve got nearly the maximum number of pages setup with every button having a BLF subscription. Each time a call comes in it’s a mass flurry of BLF updates… then when the call gets answered another flurry etc.
There is a lot of scope for threading and other programming bugs to rear their head compared to a simpler setup.


#15

Well, TCP is not ideal as it is a stateful connection and is more suited to reliable delivery and not speed. UDP is more related to speed over reliability. If you have a fairly large number of BLF needs, then hopefully, you are using BLF Eventlist. When a query is made, there is a timer (T1) which expects a response back in 500ms. if that is not achieved the query will be resent again and again, but at different intervals 500ms, 1sec, 2sec, 4sec, 8 sec. and finally 16sec or a total of ~32 seconds. This is not the process for every query nor do all UAS wait the full 32 seconds as some may not want to wait that long. When the timer expires waiting for the response, the Invite will be cancelled as it is assumed that the UCM (in this case went off-line).

This is one reason I question the bug thought as the SIP stack would still be looking for a response and presumably resending when nothing is seen within each 500ms period following each Invite.

The BLF is one message that does not get repeated. It is one and done. As the BLF is apt to be so dynamic, there is the issue of trying to keep the phones’ LEDs synchronized to the physical state of the calls. There is one Notify message sent for a status update and if the phone fails to OK the message, then the phone is removed from the subscription list and no more updates will be sent until the subscription is renewed (by a reboot or the natural subscribe process, whichever comes first). The idea is that if the phone missed the Notify, then it is likely off-line and there is no point trying to re-send (taking up time) or sending new messages to what is thought to be an off-line phone.

The only thing I can say is that I have sites with 40+ phones using direct Internet connections with out any of the issues you have reported and both smaller and larger sites with IPSec VPNs also without issues. Delayed dialing has never once been brought to my attention, but I would file a ticket and see what GS can offer.

To me, the delay issue is something that needs looking into. As stated, I do not use the phone’s OpenVPN function so…who knows?