SIP Accounts remain unregistered on network restoration - DNS SRV Failure

bug

#1

I’ve been keeping the WP800 in my work bag and travelling back and forth from office to home frequently with little issues, but occasionally the WP800 will not be able to register to it’s configured SIP accounts when WiFi connectivity to a known network is restored. Re-connecting to the same network will resolve the problem.

I was finally able to capture it today with the syslog active at my work network, and the root cause seems to be a DNS lookup failure, despite all other service working fine, and DNS servers assigned by DHCP being valid.

**Trying to upload the full debug output now, but it’s too large - "Sorry, that file is too big (maximum size is 30072kb). Why not upload your large file to a cloud sharing service, then share the link?
** Removed older syslog files from debug file and uploaded.

There are other errors in the syslog, but these are the only ones to stand out to me:

Jun 4 10:15:51 192.168.0.29 WP800: [00:0B:82:D6:4C:60][1.0.0.16] DNS::lookup: Failed to resolve den-phone-4.fluentcloud.com
Jun 4 10:15:51 192.168.0.29 WP800: [00:0B:82:D6:4C:60][1.0.0.16] DNS::lookup: Failed to resolve nyc-phone-6.fluentcloud.com

As for the logcat, there also appears to be an error thrown right as the WiFi network is re-connected, but I’m not as well versed in android logcats, and this may not be an issue:

06-04 10:15:51.702 2816 2878 E ActivityManager: Sending non-protected broadcast android.net.conn.NETWORK_PROPERTIES_CHANGED from system 2816:system/1000 pkg android
06-04 10:15:51.702 2816 2878 E ActivityManager: java.lang.Throwable
06-04 10:15:51.702 2816 2878 E ActivityManager: at com.android.server.am.ActivityManagerService.broadcastIntentLocked(ActivityManagerService.java:18686)
06-04 10:15:51.702 2816 2878 E ActivityManager: at com.android.server.am.ActivityManagerService.broadcastIntent(ActivityManagerService.java:19307)
06-04 10:15:51.702 2816 2878 E ActivityManager: at android.app.ContextImpl.sendStickyBroadcastAsUser(ContextImpl.java:1210)
06-04 10:15:51.702 2816 2878 E ActivityManager: at com.android.server.ConnectivityService.updateLinkProperties(ConnectivityService.java:4433)
06-04 10:15:51.702 2816 2878 E ActivityManager: at com.android.server.ConnectivityService.updateNetworkInfo(ConnectivityService.java:5124)
06-04 10:15:51.702 2816 2878 E ActivityManager: at com.android.server.ConnectivityService.-wrap32(ConnectivityService.java)
06-04 10:15:51.702 2816 2878 E ActivityManager: at com.android.server.ConnectivityService$NetworkStateTrackerHandler.maybeHandleNetworkAgentMessage(ConnectivityService.java:2166)
06-04 10:15:51.702 2816 2878 E ActivityManager: at com.android.server.ConnectivityService$NetworkStateTrackerHandler.handleMessage(ConnectivityService.java:2280)
06-04 10:15:51.702 2816 2878 E ActivityManager: at android.os.Handler.dispatchMessage(Handler.java:110)
06-04 10:15:51.702 2816 2878 E ActivityManager: at android.os.Looper.loop(Looper.java:203)
06-04 10:15:51.702 2816 2878 E ActivityManager: at android.os.HandlerThread.run(HandlerThread.java:61)
06-04 10:15:51.754 2816 2870 E WifiScanner: listener cannot be found
06-04 10:15:51.801 2816 2835 E BatteryStatsService: no controller energy info supplied

And finally, in the debug output, the network status lists no DNS servers, but the status page in the Web UI does.

/********** Network Info **********/
MAC: 00-0b-82-d6-4c-60
IP type: DHCP
IP address: 192.168.0.29
Subnet mask: 255.255.255.0
Gateway: none
DNS server 1:
DNS server 2:

imagedebugInfo_20180604145724.tar (2.2 MB)


#2

I will go ahead and leave the phone in this state so I can provide any additional information as requested.


#3

Hi,

Thanks for the debug we will review it. Did you have a full packet capture in the original large file? You can use this service to uploaded it: https://www.sendtransfer.com/. Set the email to: ricky.chen@grandstream.com.


#4

Sorry for the delay in responding. The original debug attachment has the full pcap file as captured at the time.

I have run another slightly longer pcap, including a re-connection to the same WiFi network, and was able to capture the problem again, and I believe I’ve spotted the issue. DNS SRV lookups are being sent to IP address of my home network router (192.168.1.1) and not the DNS servers it currently has at my work network (192.168.0.1, 8.8.8.8).

Interestingly, A record lookups are still being sent normally to the current DNS server for the active WiFi network, which would explain why anyone using an A record \ IP address for the SIP server address are not impacted.

Screenshot of relevant DNS packets and full pcap file attached. As a side note, why is the WP800 sending an NTP packet to hshh.org?20180606174330.pcap.zip (24.7 KB)


#5

Just to confirm, as expected, rebooting the WP800 does clear the problem and DNS SRV queries will be sent to the correct IP address.


#6

Hi,

Thanks for the capture. We will investigate the issue.


#7

Hi,

By any chance did you configure alternate DNS server? Could you please provide a phone configuration as well? (Web UI->Maintenance->Upgrade->Config File->Download device configuration)

Thanks


#8

@bnelsonfs

Could you please provide us the phone configuration? Also you can double check the DNS by SSH to the phone via a tool like Putty and then after login type “status.”

Thanks


#9

Sorry for the delay,

I have not actually had the phone go into this state again recently, but I will run the status command via ssh and pull the config again if \ when it does. For now, here is the current config, and status output while the phone is working correctly.

WP800> status

Vendor: Grandstream
Product Model: WP800
Hardware Revision: V1.4A
Part Number: 9620008214A
MAC Address: 00:0B:82:D6:4C:60

NETWORK:
Interface: wlan0
Network Type: DHCP
IPv4 Address: 192.168.0.29
IPv6 Address: Unknown
Gateway: 0.0.0.0
Netmask: 255.255.255.0
DNS1: 192.168.0.1
DNS2: 8.8.8.8

Sotfware Version:
Preloader: 1.0.0.5
LK: 1.0.0.8
Kernel: 1.0.0.16
Recovery: 1.0.0.1
System: 1.0.0.16
Android: 7.0

Account Info:
Account1 Name: 6627
SIP User ID: 6627-FluentStream
Registration Status: Registered

Account2 Name:		410
SIP User ID:		410-FluentCloud
Registration Status:	Registered

Memory Usage:
MemTotal: 1002628 kB
MemFree: 12724 kB
MemAvailable: 529600 kB
Buffers: 56700 kB
Cached: 457640 kB

System uptime: 612580.44 126329.63
System load: 8.80 9.04 8.68 2/870 16110
config.txt.zip (3.7 KB)


#10

Hi,

Thanks for the info. I see you have set the alternate DNS to 0.0.0.0 and you might want to just clear that field. On v1.0.0.16, there is a problem with alternate DNS but I’m not sure if this value has any effect at all. If you had set a proper value there it will take over DNS1. This bug has already been addressed in the current internal build so we think there shouldn’t be any further instances of this issue.


#11

I’m a bit confused. The DNS in use on both networks is just the DHCP issues DNS servers, and I have not set any manual network settings at all on my test unit. I can see the the Gateway address is listing 0.0.0.0 in the status output (which is odd since it is correct in the Web UI).

Now, when browsing to the Advanced Network settings in the Web UI, I do see what you are talking about:

My assumption is that this is the default value for some reason? I will factory reset the phone and see if it is set like this when it boots back up with no settings from me.


#12

Hi,

The gateway display 0.0.0.0 is a bug I believe, and the web display it correctly. For the alternate DNS, the current firmware default those fields to blank so I was wondering if you had configured it. The fixes we made recently for the DNS should correct the issue you experienced.

Thanks


#13

@bnelsonfs

Has this particular DNS issue reoccurred on 1.0.1.5? Please update your findings.

Thanks


#14

Sorry for the delay, I have just today gotten the time to install the new firmware and start testing again.

I can say that after installing the new firmware, and factory resetting after, the alternate DNS server still auto-populates to 0.0.0.0 in the web UI, however this appears to just be an issue with the web code, as it’s not present in the config export.

It does end up getting saved into the config if you don’t touch the values and save some other setting on the advanced network page, so it can still be an issue it seems?

I’ll update again once I’ve been able to register an account and roam across different AP’s as I was before when the registration failed.


#15

Hi,

We noted the concern with the default value set to 0 and will address it. However it doesn’t seem to impact the DNS functionality. If you are able to reproduce the problem let us know right away.

Thanks


#16

I was able to reproduce this same problem today on the latest firmware release.

DNS SRV queries are sent to the DNS server issued via DHCP by the last connected network, not the currently connected network. Other DNS query types such as A records go to the correct current DNS server issued by the current network. New debug attached.

Network 1 details:
Router:192.168.1.1
Subnet: 255.255.255.0
DNS issued by DHCP: 192.168.1.1
No secondary DNS server

Network 2 details:
Router:192.168.0.1
Subnet: 255.255.255.0
DNS issued by DHCP: 192.168.0.1
Secondary DNS issued by DHCP: 8.8.8.8

WP800.zip (146.5 KB)

At this point, I don’t this this is at all related to the 0.0.0.0 problem.


#17

Hi,

Thanks for testing this issue again. Could you also advise us the steps to reproduce this problem?

For example,

  1. Connect to AP1
  2. Check trace
  3. Connect to AP2
  4. Check trace
  5. Observe DNS behavior is abnormal

Thanks


#18

Yeah, that’s basically the steps, though it doesn’t seem to happen 100% of the time unfortunately. The exact steps this morning:

  1. Attached to AP1 overnight, no calls made, but registered. Network 192.168.1.x , DNS server 192.168.1.1
  2. 40 minute drive to work, no connectivity to any AP during this time
  3. Attached to AP2. Network 192.168.0.x, DNS servers 192.168.0.1, 8.8.8.8
  4. 20 minute time period before checking on status of phone - unregistered
  5. Login to phone, and start debug capture, wait about 5 minutes to ensure capture of failures.

Interestingly, it seems to only happen when moving from AP1 to AP2. When returning home to AP1 from AP2, I have not yet seen an issue with registration, though due to the intermittent nature of this problem, that may just be random chance. Maybe something to do with going from a single DNS server provided by DHCP to two DNS servers provided by DHCP is related?

Let me know if you need any more debug traces from me. The phone is currently still sitting here unregistered.


#19

I have seen this with earlier firmware and reported it early on.

I no longer go between multiple SSIDs/APs so I have no idea if this is still happening. It was also not all the time.


#20

Hi,

Thanks for the instructions. We’ll try to find the cause. I noticed your debug package didn’t contain the syslog. Were sending it for analysis anyways but it would be better for the SW team if the syslog was available. Could you please set the syslog level on device to debug and try to reproduce the issue again? After reproduce you can extract another package.