There are many issues that can cause problems with applications and services. In this video, you’ll learn about untrusted SSL certificates, exhausted DHCP scopes, unresponsive services, and much more.
<< Previous Video: Wireless Network Troubleshooting
If your domain name services aren’t working, then you’re going to have a difficult time resolving an IP address from a fully-qualified domain name. Web browsing and other applications aren’t going to work, and it will seem as if the entire internet is down. You may want to try pinging IP addresses instead of a fully-qualified domain name to see if you’re at least able to have connectivity to the network. And if you are able to ping these other devices by IP address, then you don’t have a network connectivity issue.
But of course, our applications commonly use fully-qualified domain names instead of IP addresses. So if you’re not able to make that resolution, your applications are not going to be able to function. The first thing you may want to check is the IP address of your local device. If you’re able to ping a device that’s on another subnet, then you know you have the correct IP address, subnet mask, and default gateway. But you may want to check the IP configuration for your DNS servers. Make sure that IP addresses are listed under your DNS server configuration and make sure they are the right IP addresses for your DNS servers.
You can then open a command prompt and use nslookup or dig to perform queries against that DNS server. You want to see if you’re able to receive responses from the services that you would like to access. And if those DNS servers are not responding, you may want to try a different DNS server. Google’s DNS servers are 18.104.22.168 and 22.214.171.124. Or you may want to try the servers at Quad9, which are 126.96.36.199.
If the IP configuration on your device is not correct, you may see a number of different symptoms occur. One of these might be that you can communicate to local IP addresses, but you’re not able to communicate to IP addresses on a different subnet. Or you may find that there’s no IP communication at all, and you can’t communicate devices on your local subnet or a remote subnet. Or you may find that some IP addresses on your local subnet are accessible, but others are not accessible from your machine.
The first thing you want to do is check your documentation and make sure that you have the correct IP address for your subnet. You’ll want to check your computer’s IP address, subnet mask, and default gateway. And you want to make sure that matches what you show in your documentation. If you think your switch is configured with the wrong VLAN information and you’re on the wrong IP subnet, you should be able to capture packets and at least see some information appear from your local subnet. That might give you some clues as to which subnet you’re connected to.
If you’re not on your network or you don’t have access to the documentation, you may want to look at other devices around you that seem to be working. You can look at their IP address, subnet mask, and default gateway, and see if that matches the subnet for your device. And of course, the problem may be associated with something else in your infrastructure. So you might want to perform some pings and traceroutes, and see just how far you’re able to get outside of your local subnet.
Some network administrators prefer to manually configure the IP addresses on all of their devices. They don’t have a DHCP server, so they have to be very careful that they’re not duplicating any IP addresses between devices. But of course, DHCP doesn’t guarantee that you’re not going to have duplicate IP addresses. You may find a combination of static IP addresses and an overlap with the DHCP pools, or you may have multiple DHCP servers, and you’ve accidentally configured duplicate IP addresses on both of those servers, or someone may turn on their own DHCP server without your knowledge, and now a rogue DHCP server is handing out IP addresses.
If two devices manage to connect to your network with the same IP address, you’ll find that they’ll fight with each other. One device will have connectivity and then the other device has connectivity, and it’ll switch back and forth between the two devices. However, on most modern operating systems, the OS performs a check of that IP address before it connects to the network. And if it finds that IP address is already in use, it blocks your system from creating a duplicate.
To troubleshoot these duplicate IP addresses, you can start with the devices that are being manually configured. Check the IP address, subnet mask, and default gateway for your specific workstation, and make sure it matches your documentation. Another thing you can do is before bringing that station online, use a third station to be able to ping that IP address and see if another device responds. If another device does respond, you know that IP address should not be manually configured on another device.
If you are manually configuring the IP address and you know it’s the right address, but some other device is already using it, you can use that third party device to ping that IP address, find the MAC address of that device, and then locate that MAC address in your switch. That should tell you what interface that device is connected to.
If you think you’re getting this duplicate IP address from a DHCP server, you may want to capture the packets associated with the DHCP process, and you’ll be able to tell exactly which DHCP the server is providing you that duplicate IP address. One type of duplicate address you don’t see very often is a duplicate MAC address. MAC addresses are burned into the network interface card. It’s very unusual to see two interface cards with exactly the same MAC address.
If you do see a duplication of MAC addresses, it could be something innocuous, like someone had misconfigured a manual MAC address configuration. Man-in-the-middle attacks can sometimes spoof existing MAC addresses, so you may want to check and make sure there are no security concerns on your network. Usually though, the problem is more benign. The issue may be related to a locally-administered MAC address that has been misconfigured in a system, or sometimes you will run into a manufacturing error where two different interface cards have the same burned-in address.
If you do see multiple MAC addresses on your network, you may find that those devices have intermittent connectivity. The switch is going to be confused about exactly where that MAC address happens to be on the network. If you’re trying to confirm the MAC address of a device, you may want to ping the IP address of that device and then look at your ARP cache to see exactly what MAC address is associated with that IP.
If you’re using DHCP on your network, you know that most devices will be able to renew their IP address halfway through the lease time. If you find that the DHCP assigned IP address of a device is expiring, this may indicate a problem with the DHCP server. If a DHCP server is not available to renew that IP address, then the client will release that IP address at the end of the DHCP lease.
We know that if an IP address then is starting with 169.254, then they have an automatic IP address assignment and they were not able to retrieve a DHCP assigned address. Your first place to go then would be your DHCP server. Make sure that you have addresses available in the pool and that the DHCP server is working normally.
What if someone happens to install a DHCP server on your network and starts handing out IP addresses to anyone who might need them? This would be a rogue DHCP server. And because there’s no security inherent to DHCP, this might be something very easy for someone to configure and put on your network. This could mean that someone might be assigned an invalid or duplicate IP address. And that, of course, would affect many devices on the network and would probably prevent many clients from being able to communicate to other devices.
One way to disable this rogue DHCP server is to enable security on your switch. There’s a function called DHCP snooping that may be able to identify rogue DHCP devices, and you may be able to authorize DHCP devices in Microsoft’s Active Directory. And only those devices would be allowed to hand out DHCP addresses.
To resolve this problem, you would first have to identify the rogue DHCP server and disable it. You would then need to find all of the devices that received an IP address from that server, have them release that IP address, and then renew with the normal DHCP servers. If you’re communicating to a web server over an encrypted channel, and you receive a pop-up message in your browser, and this error says that the certificate is not trusted by your computer’s operating system, then you may have a problem communicating securely to that web server.
This means that your browser received the certificate from the web server, but the certificate authority that signed that certificate is not in the browser’s configuration. So the browser doesn’t trust that certificate. This could be that the certificate itself has not been signed by a certificate authority or the certificate authority that has signed the certificate is not part of the trusted certificate authorities that are listed in your browser. You need to look at the certificate details itself. It will tell you what the issuing CA happens to be.
On this particular certificate, you can see there is a issuer that is CAcert.inc. And that CAcert certificate does not exist inside my browser, so my browser is not going to trust that web server. If you’re communicating to an internal web server on your company’s network, then you may need to add your company’s certificate authority to your browser. Normally this internal certificate is added by your workstation administration team, but you could manually add that certificate as well.
Configuring the date and time on all of the devices on your network become very important when you’re trying to implement security. For example, the default tolerance for Kerberos is a five-minute window. So you have to have very tight tolerances on the time and date on all of your devices. This is because Kerberos is assigning you a ticket, and that ticket has a time stamp associated with it. If that time stamp is too old, Kerberos considers that ticket to be invalid and then your client is not able to log in.
That’s why one of the first things we do when there’s a problem with Kerberos or being able to log in is to check the time stamp on the device that’s trying to gain access to the network. The easiest thing to do, of course, is to configure all of your devices with the network time protocol, or NTP. This makes it so that every device can automatically update its clock and stay in sync with one another.
If you’ve ever managed a DHCP server, you know that you create a pool that has a certain number of available IP addresses in the pool. But what if you run out of addresses? In those particular situations, you’ll find the devices are not able to get an IP address from the DHCP server, and they’ll assign themselves an APIPA address. If you find that devices are assigning themselves an APIPA address instead of assigning a DHCP address, you may want to check your DHCP server and that you have enough IP addresses available. And if possible, you may want to add additional IP addresses to the pool.
Exhausting a DHCP scope can sneak up on you, so you may want to implement some IP address management, or IPAM. This would allow you to monitor and get notification if your DHCP pool gets low. And if you have a lot of transient users that move in and out of the network every day, you might want to lower your lease time. This would allow more IP addresses to be released faster, and would provide a larger pool for other users that might need them.
On today’s networks, we’re adding many different security devices. And we may find that certain application flows may be blocked due to filters installed on a firewall. This could also be configured as an ACL on a router, and it would be restricting the access for an application to travel through that network device. These security checkpoints are usually configured with very conservative rules, and it’s not uncommon for these rules to block new applications from working on the network.
One of the first things you can confirm then is that there is some type of communication problem. If you perform a packet capture, you can see the application request, and then you can see that no response is received. From there, you may want to run a traceroute tool that allows you to customize the TCP or UDP port number that’s used. This would allow you to see just how far the traffic is able to go, and then you can provide that traceroute information to a network administrator who can then determine where the filtering is occurring.
A similar problem might occur if the application is being filtered on your device with a host space firewall. A firewall administrator may be able to configure not just a port number, but the application name itself to be able to filter that traffic. In environments where the host-based firewall is administered centrally, you may not have access to view firewall information. So you may need to document exactly what application you need to use, and provide that information to the firewall administrator.
In these scenarios, you may want to perform a packet capture from an external device so you can see exactly the traffic that’s leaving that computer and the traffic that’s coming back. Access control lists can provide extensive security options. You may find that they’re blocking some traffic from getting through, but other traffic is able to flow properly. If you were to look at the access control list, you can see there are a number of different filtering options. You can filter by IP address, port number, and many other options as well. And you can allow or deny traffic based on a combination of this criteria.
If you’re trying to determine if an access control list may be blocking your traffic, you can perform a packet capture, be able to see exactly what traffic you’re trying to send and what traffic is being received. You might also want to use a traceroute utility that allows you to customize the TCP or UDP port number. This would allow you to send traffic into the network, and you’d be able to tell at exactly which hop the traffic is stopping.
If you’re trying to communicate to a server and you’re not getting any response, you know the problem isn’t related to a filter or an ACL, and there may be a service that’s simply not responding to your request. You may want to check and make sure that you’re accessing that service over the correct UDP or TCP port number. And if it’s different, you need to make that change in your application.
You want to confirm that the device itself is up and running. You may want to run a ping or a traceroute to the device, and make sure that you’re able to communicate to that server successfully. And if you are, you might want to try telnetting to that particular port number itself and see if you’re able to make the application talk back to you. If that application isn’t responding, you may need to restart the application or restart the server where that application exists.
If the server that is hosting that application is having a problem, the issue may be very similar to the service itself not responding. We try to use the application, and we get no response from that device. We’re going to first confirm the connectivity. So if we do have a hardware failure, we’re probably not going to receive a response to this ping.
We can also confirm that with a traceroute. So we can see exactly what hops we’re going through to get to that server, but we can also see that the server is not responding to that traceroute. At that point, we want to check the server ourselves, or we’ll need to contact the help desk or server administrator to see if they can find out why that server is not responding.