Finding and fixing common network issues can be a challenge for any network administrator. In this video, you’ll learn how to troubleshoot rogue DHCP servers, IP configuration issues, certificate problems, firewall settings, and more.
One of the challenges when managing a network that assigns IP addresses using DHCP is there’s no inherent security built into the DHCP protocol. That means that someone could start their own DHCP server on this network and begin handing out IP addresses from an unauthorized device. This means the devices could be assigned an IP address that is not appropriate for their subnet. Or they may be given an IP address that duplicates another IP address on that same subnet. These duplicate IP addresses could cause error messages on the screen, and those devices would disable their connectivity to the network.
Although there’s no security built into DHCP, we have added security features to our switches that can help us with these shortcomings. This security feature is often called DHCP snooping, and it’s a feature we would enable in our switch to identify where the legitimate DHCP servers might be. This allows us to tell the switch that if any DHCP communication is coming from an unauthorized server to disable that communication on our network. Active Directory also includes a feature that allows you to identify authorized DHCP servers and limit any non-authorized servers from communicating with Active Directory. And once you identify where this rogue DHCP server might be, we need to disable that device and have all of the devices on the network release their current IP address and have reassigned addresses from the legitimate DHCP server.
You might also run into situations where there are no more IP addresses in the pool of your DHCP server. This is an exhausted DHCP scope and any device that can’t receive an IP address from the DHCP server will usually have an automatic private IP address assigned to their device. Obviously, one of the challenges with an APIPA address is that those devices will only be able to communicate on the local subnet. They will not be able to communicate to the internet.
Your first stop should be with your DHCP server to see why it’s not assigning more IP addresses and you should be able to recognize that your pool of IP addresses has been depleted. An IP address management, or IPAM device, may be able to monitor and then alert if it runs into any of these exhausted scopes. And if you have people that are constantly coming into the office and leaving, you may want to lower the lease time so that you have more IP addresses available in your pool for other people on the network.
If you have a user that has some type of IP configuration problem, then they probably are able to communicate to local IP addresses but not any subnets that may be outside of your local subnet. Or they may not be able to communicate to any other device on the network even if those devices are on their local subnet. There might also be cases where there’s intermittent connectivity or they’re able to communicate to some subnets but not others.
The first step is to check the basic configuration of that device. We want to confirm the IP address, subnet mask, gateway, and DNS settings are correct for that device. We would then want to monitor the traffic. We can usually do this with a protocol analyzer or some other monitoring tool that you might have in your switch. This will allow you to get an idea of how much unicast, broadcast traffic, or other types of communications are occurring on that network.
You can’t usually determine a subnet mask from a packet capture, but you are able to reference your original documentation and confirm that you’re using the correct subnet mask for that particular VLAN. If you still aren’t quite sure what the correct subnet mask or gateway might be, you may want to look at other devices that are on the same VLAN. A device that’s right next to you may already have a correct configuration and you can compare that to the configuration on your device. And it might be useful to begin gathering information about what devices you are able to communicate to and which ones you’re not able to communicate to. You can use the ping command and the traceroute command to gather information about what devices are responding and what subnet you’re able to communicate to.
If you’re having a network connectivity problem over a fiber network, your issue may be related to light. Obviously, on a fiber optic network, we’re using light to be able to send information from one side of the network to the other. And if we have anything blocking that light, then we will have intermittent connectivity. One of the most common causes for this loss of light, or attenuation, on this fiber connection is a dirty connector. That’s why we always tell you to clean these connectors before installing them in your network equipment.
And it may be a good idea to reclean them if you’re troubleshooting this type of problem. And if you have a light meter, you can connect it up to both sides of this connection to see just how much light is coming over this fiber. That might give you a clue that somewhere along this fiber run you might have a problem that’s causing this increase in attenuation. You’ll then want to compare the results from your light meter to the documentation for your network device to see if it’s receiving enough light to be able to properly operate.
Another important error message to always be looking for are certificate errors when you’re connecting to a third-party device. You might see a message that says the site’s security certificate is not trusted or you may see some other type of message that identifies that something isn’t quite right with that certificate. Most browsers will allow you to look at the certificate information that’s been received from that server. Sometimes you can click on the lock icon or some other button that provides that information. You’ll want to check to see if that certificate has a correct domain name and that it hasn’t timed out. You also want to be sure that everything has been properly signed with the certificate and that the certificate shows as being valid. If any of these values are not correct, they will not be validated properly by your browser, and you’ll get these error messages on your screen. This could be related to the certificate that’s in the server and there’s a misconfiguration, or it could indicate that a third party is trying to perform an on-path attack.
Sometimes, the problems we have with communication on the network are related to the devices themselves not operating properly. This hardware failure can be difficult to troubleshoot because you get no response from that device because, obviously, it’s no longer operating. So you might ping the device and receive no response in return. You can run a traceroute, which would get you close to the device, but if this is a router that has failed, you won’t get any response from that IP. Ultimately, you may have to physically visit that device to see if it’s powered on, if there’s any lights or error messages on the machine, and that might take you further down the process of troubleshooting.
We often have firewalls on our network that are protecting our data center or they’re sitting between us and the internet. But if there are configurations in the firewall that limit certain applications, port numbers, or protocols, then we may find our applications are not going to work properly. We also have to think about the host-based firewalls that are usually enabled on all of our local endstations. So we’ll want to look at those settings to see what an administrator is allowing or not allowing to be sent or received from that device. From there, we can check our network-based firewall to see if there is a firewall rule or some other configuration that would prevent this application from communicating. One of the best ways to troubleshoot this type of problem is with a packet capture. We can show that we’ve sent information out and received nothing back in return, which might indicate a problem with the firewall configuration.
In an enterprise, we have hundreds or maybe even thousands of switches that we’re managing. And we have hundreds or thousands of devices plugging into those switches. This makes it very easy to accidentally configure an incorrect VLAN on any one of those individual ports. And if you connect a device, and it’s on the wrong VLAN, it may be assigned the incorrect IP address or have no connectivity on the network. This means the switch administrators should examine exactly what VLANs have been assigned to which individual interfaces on that switch and make sure they match what the proper configuration settings should be. You should also make sure on any trunk connections that you’ve included all of the VLANs that need to traverse that trunk. Usually, a single configuration change can reassign the VLAN for that interface and the user can now communicate on the network.
If you’ve done any work on a help desk or as a network administrator, you’ve probably gotten the phone call that the internet is down. People are not able to communicate, or they’re not able to work on a particular website. If you do some additional troubleshooting, you may find that you’re able to ping an IP address from that person’s workstation, but you’re not able to browse to any fully qualified domain names. And, of course, being able to ping a device by IP address, but not able to browse to that device by its fully qualified domain name may make you think that there’s some type of DNS issue.
The first thing we should check is that the DNS configuration on that local device is configured for the proper server. So this would be a good time to verify the IP address, subnet mask, default gateway, and DNS server IP configuration. We could also go the command line on this device and perform an nslookup lookup or a dig test to be able to see if we’re able to resolve IP addresses from a fully qualified domain name. And if we think the problem is associated with one DNS server, we may want to change the IP address configuration of this device to point to a different DNS server. For example, if you wanted to use a public DNS server, you could point to Google at 188.8.131.52 or Quad9 at 184.108.40.206.
Another important protocol that often works behind the scenes on our network is NTP, or the network time protocol. We rely on NTP to make sure that all of the devices on our network have exactly the same date and time configured across every single one of those devices. This is especially important for certain cryptographic functions that are used by protocols such as Kerberos, which is used by Active Directory. That has a minimum of five minutes that all of the devices need to be in and that’s why NTP is so important to keep everything up to date with the latest time. Kerberos uses these timestamps to determine how old a particular ticket might be. And if these timestamps are wrong on either the server or the client, you’ll find that devices simply can’t log in to the network. Although NTP is often configured automatically in an operating system, you might also want to confirm that the configuration is enabled and that it’s pointing to an appropriate NTP server.
BYOD stands for Bring Your Own Device, although you might also see it referred to as Bring Your Own Technology. This means that your employees may own the mobile devices that they carry around with them, but they have a certain portion of the device set aside for business use. By itself, these phones can be a challenge to secure because you’re going to have both home information and work information on the same device. How will you protect the data that is on this device? And how will you separate data for personal use from data used at work? And from a security perspective, we need to know that if this device is stolen, sold, or traded in, that the data that’s on this device remains secure. Most organizations that have BYOD or use any type of mobile device will probably use a Mobile Device Manager, or an MDM. This MDM is a centralized configuration tool that allows IT to set policies on what can and cannot be used on this mobile device.
If you’ve ever configured a firewall, a router, or switch, or some other type of network infrastructure device, you’ll notice there are a number of features that can be enabled and disabled in that device. These features are often based on a series of licenses, and if you pay for the license, then you can enable that feature. This means if you were to look through all of the features on this device, there may be certain features that are not available to you because you’ve not paid for that feature and that license has not been enabled. This can cause problems if you have created a configuration in your lab based on one set of licensed materials, but then when you try to deploy that into production, those license features are not enabled on those production devices, and your configuration fails. So if you’re planning to roll out some configuration changes, it’s useful to examine what is licensed on those remote devices and make sure that all of the configurations in your lab match the licensing for your production equipment.
One very common complaint to a help desk or network administrator is that the network is slow. From the user’s perspective, they’re just seeing that the application isn’t performing well. But the actual problem might be one of many different issues that could be occurring on the network. There’s never one single performance value to look at. It’s usually a combination of many different metrics.
For example, if we were looking at the performance of a network-attached storage device, we might see that there are I/O bus metrics, CPU speed, storage access speed, network throughput, memory settings, and many other variables. Any one of these performing poorly could cause the entire device to have poor performance. This means that we have to monitor every single one of these metrics over time to be able to identify where a slowdown may be occurring. And because there are so many different metrics to choose from across so many different components, this can be quite a challenge to try to look at everything that would be important. But if you are monitoring the right things, it should be very easy to see where a problem may be occurring. And if you correct that problem, it should be very obvious when you go back to a very responsive application.