Network protocols you should care about as a software developer
This is the first post in a series of “things I wish someone had explained in my first year writing code.”
My networks course in school explored the seven layers of the OSI model in detail, from the material inside an Ethernet cable through to the packet specifications for the DNS protocol. I do highly recommend taking a course like this to get an appreciation for it all, but really it’s way too much. You’ll go a long way by knowing the fundamentals in the lists below.
I won’t go into detail about each protocol here, but I hope this serves as a useful mental model and/or reference point for common network communications you’ll encounter in a wide range of fields.
First off, what even is a protocol?
When two computers are communicating with each other, they’re sending lumps of binary data back and forth. To actually read and understand those lumps of binary data we need to structure them in some agreed upon way. This is what a protocol does, it precisely defines how the lumps for a particular type of network traffic will be organised.
You use protocols like this in daily life, for example, when you send a letter in the post. The rules are a little looser, but typically you start the letter with the date, your address, the recipient’s address, and then your message. The recipient is expecting this format when they read a letter. The same thing happens in HTTP—an HTTP request to Google will contain a bunch of details including the server you’re requesting information from, your search query, and which data formats you’ll accept for the response. Google uses this information to send an appropriate HTTP response back to you.
In computer networks, we also have the concept of network layers. This is where higher layers of traffic get carried by lower layers in the networking stack. Think about how after you’ve finished writing your letter, you enclose it in an envelope and follow a different set of rules to instruct the post office how to get it to the right destination. Not dissimilar to how an HTTP message gets enclosed in a TCP message.
So, back onto network layers, starting at the top (i.e. writing the letter).
The interesting stuff! These are the specialised protocols to solve problems ranging from email to torrenting files. The most useful ones to understand are:
- DNS — How computers convert human-readable domain names, e.g.
www.google.com, into an IP address, e.g.
22.214.171.124. When lower layers in the network are trying to find a computer on the planet, they can only understand IP addresses, so this is a really important step.
- FTP — A very common way to get files onto and off of a server. If you’ve ever used a web-hosting service you might have used FTP to load your website files onto their servers. Also see FTP’s encrypted cousin, SFTP.
- SSH — A way to log in to a remote computer over a network and use a terminal shell. Let’s say you’re running a website on some server in an AWS datacentre — SSH lets you type commands on your laptop that will actually be executing on that remote host. This is very useful for changing service configurations, inspecting logs, or debugging errors on servers that could be anywhere you aren’t.
- TLS/SSL — This is very important as it encrypts your network traffic, making it unreadable to prying eyes. The ‘S’ in HTTPS stands for “secure” and the protocol works by encrypting HTTP traffic using TLS. (You should almost always be using HTTPS, by the way.)
This layer is going to figure out how to transfer stuff from the application layer (e.g. HTTP) between two application processes on different host computers (e.g. your browser and Google’s web server). There’s a bit of a duopoly in this space:
- TCP — dependable, operates with military precision. Given two computers that want to send a bunch of bytes to each other, it ensures that every byte makes it intact, in the right order, and it uses congestion control techniques to be courteous on the network. Applications use this when communications need to be reliable (all of the application layer protocols I listed above, except for DNS, use TCP).
- UDP — the reckless and free-spirited brother. UDP fires out packets of data to a remote host but there’s no guarantee it will make it! This may sound useless, but what you lose in reliability you gain in speed. Applications that need to live-stream video or audio will often use UDP because it reduces the time it takes for frames to travel between you and your Grandma. (When your connection cuts out on a call it might be because a few UDP packets just got lost.) DNS is another protocol that uses UDP.
OK so we’ve packaged up our application information (let’s say HTTP), and it’s hitching a ride on TCP, now what? We need something that knows how to take that bundle of goodness and deliver it to the right address, wherever the server with that address is on the planet.
The bottom line here is that most of the Internet still runs on IPv4, but we’re running out of available addresses because the addresses are only four bytes long (e.g.
126.96.36.199). IPv6 is being rolled out, bringing some security and performance improvements, but thankfully it’s also adding approximately 3.4×1038 new addresses.
The layers introduced here align with the top of the Internet protocol suite. Going deeper things start to get more physical — you’ll encounter technologies like Ethernet, WiFi and routers which are responsible for actually transmitting all of this data around your home and across the globe. Do check them out, but in your day to day as a software developer this round-up should cover most of your bases.