The Internet Protocol is one of the most popular protocols in the world. In this video, you’ll learn how IP, TCP, UDP, and port numbers are used to transfer information over our modern networks.
If you need to move a box from one place to the other, a common way to accomplish that is to get a moving truck. We can put boxes in the moving truck, move that box to another location, and unload the moving truck. In the case of networking, the road that we’re driving on is the network. So if you’ve got a wireless network, a DSL network, a cable modem network, or you have an ethernet switch, that is the road that we’re using to transport this particular package.
In this metaphor, the truck itself is what we will consider Internet Protocol or IP. We’ve designed all of these networks so that this type of truck can move most efficiently between one point and another. In each moving truck is a box, and inside of that box is your data. The moving truck has no idea what’s inside of that box. All it knows is that it has one job, which is to put the box in the back of the truck, transport it across the network, and unload that box on the other side. If you were to open up the box, there would be a certain type of information inside. It may be specific to a certain application or a particular function on your network.
You can think of the information in this box as belonging to a particular room in your house. So when the box is delivered, that box needs to be moved to the bathroom, the kitchen, or the living room. It’s this encapsulation process that allows us to move all kinds of data across the network so we would have application data we would put inside of a box. We would place that box inside of a truck. That truck would then move across the network and on the other side, we remove the box from the truck. We open up the box and we take out the application data.
It you were to look at this visually, this is the way that our networks operate. On one side may be your device a workstation, a client, a laptop computer, and on the other side is a server. This could be a web server or a mail server or any other type of server that you’d be connecting to. This client is going to send information to this server. And if this is across an ethernet network, then everything inside of that ethernet packet is what we call our ethernet payload. In ethernet, we also have a header at the beginning of this particular frame and an ethernet trailer that’s at the end of the frame.
This ethernet payload could have anything inside of it. But as you’re probably aware, the most popular protocol we use on our networks is the Internet Protocol or IP. So our ethernet payload will have an IP header and then there will be an IP payload within that particular part of the frame. Obviously the IP payload has information inside of it. This could be TCP data with a TCP header and a TCP payload. And as you probably can expect, that TCP payload can be separated out into different types of data. So for this entire ethernet frame, inside we have IP. Inside of IP we have TCP, and inside of TCP we have HTTP data.
Let’s drill down into this IP packet at the TCP protocol and the UDP protocol. TCP and UDP are transported inside of that IP packet. We commonly say that they are encapsulated within IP. And they are two very common ways to move data from one part of the network to the other. You may be using TCP for some applications and UDP for other applications. You might also hear someone refer to TCP or UDP as operating at the transport layer of the OSI model. Sometimes we refer to this as OSI layer 4.
You may think that IP is all you would need to be able to move data from one part of the network to the other, and in many ways you would be correct. But TCP and UDP add additional capabilities that IP can’t provide. For example, these provide multiplexing so that you can have many different applications on your system communicating to a separate server all simultaneously. So your workstation is sending a lot of information for a lot of different applications to the server and the server is able to determine what applications are in use through the use of this multiplexing.
Let’s really break down the difference between TCP and UDP. TCP stands for the Transmission Control Protocol. We often refer to this as a connection oriented protocol. That’s because there is a formal process to set up the flow from one device to the other and a formal process to tear down that flow when the conversation is over. We sometimes refer to TCP as reliable delivery. This doesn’t mean that TCP somehow works better or faster than other protocols on the network. It means that TCP has a built in system to ensure that data that has been sent has been verified as being received on the other side. This allows us with a number of different features behind the scenes.
And one of the most important is that TCP can reorder messages that may have been received out of order, which sometimes can happen on networks that have multiple links to a single location. And TCP can manage a retransmission process so that if any data is not received by the destination, that information can be resent from the source. TCP also has a flow control mechanism so that if a device feels that it’s receiving information too quickly, it can tell the other side to slow down the process so that information can be received at a more reasonable rate.
UDP is the User Datagram Protocol. With TCP we had a connection oriented flow, but UDP is a connectionless flow. There’s no formal process to set up a traffic flow and there’s no formal process to tear that down at the end. UDP simply sends data from one place to the other. And it’s a very simple transaction to be able to send information across the network.
Because there’s no acknowledgment being sent by the destination device, we refer to UDP as unreliable. Again, this doesn’t mean that UDP does not work as well as any other protocol on the network. It only means that we have no receipt or any knowledge that the information that we’ve sent was really received by the device on the other side. With TCP, we had a way to regulate the flow of communication across the network. But because this is a single conversation between one device and another, there’s no flow control on UDP and no way to determine whether a device should slow down or send information faster.
If you compare these two protocols, it does sound like UDP is not as functional and therefore may not be the best choice for sending information over our network. But in reality, UDP plays a very important role in being able to send information very quickly over the network. UDP is most associated with real time communication, communication where you can’t stop, retransmit information, and then catch up with yourself. If you’re on a phone call, there’s no way to rewind the conversation and send a packet that may have been missed a second or two ago. With UDP, we simply send the data. If it makes it to the other side, then we were successful. If the information was dropped along the way, we simply keep the conversation going.
An example of protocols that use this connectionless form of communication is DHCP, which is the Dynamic Host Configuration Protocol. This is the protocol that we use to automatically assign IP addresses to our devices. And another protocol that uses UDP is TFTP, or the Trivial File Transfer Protocol, which uses UDP as its transport mechanism, since both of these protocols are using UDP, information is simply sent across the network with no type of acknowledgment that the data was received on the other side.
It’s up to the application, therefore, to keep track of who has received information and who has not received information. So in the example we gave before, DHCP is responsible for making sure that information may have been received by the other side. So if it sends information and doesn’t receive a response, DHCP is responsible for resending that data over the network. With TCP, we receive an acknowledgment for any packets that are sent over the network.
An example of protocols that take advantage of this return receipt functionality is HTTPS. That’s the Hypertext Transfer Protocol Secure, commonly used to send information in our web browsers and protocols like Secure Shell, or SSH, which provide us with an encrypted form of terminal communication between our systems. If our HTTPS data between a web server and a client somehow loses a packet between point A and point B, TCP will recognize that packet was missing. It will ask to retransmit that information and the transmitted data is sent over the network. All of this happens automatically with TCP and HTTPS and SSH don’t have to worry about managing the process of getting data from one side to the other.
So far in our moving truck metaphor, we know that we have our IP delivery truck. This trunk is moving information from one physical address. In the world of networking, this is one IP address. And it’s delivering it to another IP address. Just as every house that’s on your block has a unique mailing address, every computer that’s inside of your network has a unique IP address. At this point, our moving truck IP has taken information from one IP address and moved it to another IP address. Once that box is received at the destination IP address, there is more information that needs to be examined on the label to determine where that box is to go inside of the house. Inside of your house, for example, you have many different rooms. There’s a bathroom, a kitchen, a living room, and a bedroom.
And when this box is delivered, we have to determine what room is going to receive that box. In the case of TCP and UDP, there’s an additional piece of information that’s added to all of those conversations, and that piece of information is a port number. That port number determines what room in the house is going to receive this data. Or in the case of an actual server, what application on the server is going to receive this data. The port number’s written on the outside of the box.
So when the box is received at the front door, we can look at the box and see, oh, that box needs to go to the bedroom. In the case of port numbers, each room has a number and we know that the bedroom is port 80, we have a living room of port 443, our bathroom is port 25, and the kitchen will be port 123. When we receive the box at the front door, we look at the port number. It says port 80. So we can move this box inside of the house and deliver it into the bedroom.
In the case of our server, we have four different services running on the same IP address. We have a web server sending unencrypted data on port 80. We have a web server sending encrypted data on port 443. We’ve got a mail server on port 25 and a time server on port 123. When this packet is received by our IP address at the front door, we examine the port number.
And if the port number says this is for the service running at port 443, that packet is delivered into that service that’s running on that device. This is where the multiplexing feature comes from that I mentioned earlier. This front door is going to be receiving a lot of boxes destined for this IP address and the port number allows us to know exactly what service running on this device will be receiving that data.
So to complete this traffic flow between these two devices, we need some information. The first would be the server’s IP address, a protocol that would be in use such as TCP or UDP, and for that protocol what server application port number should be assigned for that. In the example we had before, we had the house that had four different ports running inside of that house for port 80, port 443, port 25, and port 123, those are all associated with the server IP address, the protocol, and the application port numbers. The client communicating with that server also has an IP address. It’s communicating using TCP or UDP. And there are port numbers that it is using to send that data so that when a response is received, we know exactly what that response is associated with.
It’s important on our server that the port numbers we’re communicating with are well known. For example, if the browser on a client wants to communicate with a web server, we know that that web server commonly uses TCP port 80 and TCP port 443. Every web server we communicate with will use those same port numbers so that we know exactly where that service is located on that IP address.
Because these port numbers are usually permanent, we refer to them as non-ephemeral ports. This means they are non-temporary port numbers that tend to be the same every time we access that device. Port number 80 is commonly associated with HTTP. Port number 443 is commonly associated with HTTPS. And if you go to any web server, that’s usually the port numbers that will be in use. If you were to look at those port numbers, they’re commonly between port 0 and port 1023, but these port numbers can really be anything as long as they’re port numbers that are commonly known and well known across multiple devices.
When you’re communicating to the server, you need a port number on your device that you can associate with this particular traffic flow. These are usually temporary port numbers and once that traffic flow is over, will no longer use that port number. We refer to these as ephemeral ports or temporary port numbers. And commonly an operating system will assign a port number between 1,024 and 65,535. But this is often configured in the operating system itself and it’s assigned in real time as you’re using these applications.
If you were to look at a protocol decode of these conversations, both TCP and UDP can therefore use any port number between 0 and 65,535. As we mentioned, most services are going to use non-ephemeral or non-temporary port numbers, but that’s not always the case. There are some applications that use dynamic port numbers that can change dramatically from one device to the other. Just keep in mind that this is simply a number associated with that service. And if we know that, then we’re able to communicate with that service and have a conversation to send data.
You might also think that you could change the port number on the server to something that is not well known and that would be more secure, because it might hide the application or keep other people from accessing that application. But port numbers are not designed to be a security mechanism. They’re simply designed to allow you to access those services on that particular device. It’s relatively easy to use a port scanner to find all of the open ports on a particular server and then begin to do more research to determine what service is really running on that port.
This means when you access all of the different sites that you visit on the internet that all of those sites are going to be using the same port numbers, which are well known. This allows you to simply type in the name of the website and you’re immediately connected to that site and able to transfer data. If all of those different websites used completely different port numbers, we would have to have another mechanism in place to somehow determine what the appropriate port number might be for that individual site. You can see why having well known port numbers makes the process so much simpler.
Although these port numbers can range between port – and port 65,535, TCP has its own set of port numbers that are different than UDP’s port numbers. This means that there could be a service running on TCP port 80 but a completely different service running on UDP port 80. As you can imagine, having one service running on TCP port 80 and another service running on UDP port 80 could be a bit confusing, which is why we don’t tend to do that in normal operation.
So let’s take a scenario from the picture we looked at earlier. We have a client on the left side. Its IP addresses 10.0.0.1. It is communicating to a server on the other side, which IP address is 10.0.0.2. And you can see there is web server traffic communicating over TCP port 80 to this server. This server is also voice over IP server using traffic communicating over UDP port 5,004. This server is also an email server communicating over TCP port 143.
We want to send information from this client to the server. We’ll send web server traffic with HTTP data inside of it. We’ll send VoIP traffic. There’s the VoIP data inside of that packet. And email traffic with email data inside of the packet.
If we were to look at this a little bit closer, we would see the comparison of port numbers on both sides of the conversation. So we have these two devices, the client and the server, and you can see the source IP address of 10.0.0.1. That’s our client. And the destination IP address of 10.0.0.2, which is our server. When the client wants to send data to the server, it’s sending HTTP data. So we know we’re going to be sending data to a destination port of port 80. For our web traffic, we know that that’s going to use TCP data and we know the well known port for web traffic for the server is TCP destination port 80.
But we need some port number to send this information from. So this client will pick a random port number. And in this example, this client picked the random port number of 3,000. So the TCP source port is 3,000 heading to a TCP destination port, which is a well known port, a port 80, on the server to be able to send the HTTP data.
At the same time, this client wants to communicate to the server using voice over IP. Our source and destination IP is the same. You can see that UDP is being used in this scenario because we are using VoIP traffic. And VoIP traffic uses UDP. The destination port is 5,004, which is the well-known port number for this VoIP server. And we picked a random port number to send this traffic over UDP using port 7,100.
The same thing applies for the third conversation that’s occurring simultaneously where the source and destination IP address in this example are identical. The destination port number is TCP port 143 because this is email traffic. And this client picked a random source port of TCP port 4407 to send this email data. You can see that we’re sending a lot of information simultaneously across the network. But because we’re using IP addresses and port numbers, the server knows exactly where this traffic goes once it’s received by this destination device.