Part Three

The
Network

Parts I and II built one machine — sand to transistor, kernel to language. Part III steps off the motherboard. The voltage that flips a transistor inside a CPU is, at a different scale, the voltage that crosses a continent. We follow the wire out of the chip and into the world: the physics of the substrate, the architectural decision to send packets rather than circuits, the protocol that pretends a noisy network is reliable, the conventions that make a "website" work, and the language that runs in every browser on Earth. By the end you will be able to read every layer of an HTTPS request from voltage to JSON.

CHAPTER 08
The Wire
Physics of the substrate — copper, fiber, radio. Shannon's information theory. Encoding. Ethernet. The OSI and TCP/IP layer models that organise everything above.
CHAPTER 09
Packets — How the Internet Decided to Work
Circuit-switching vs packet-switching. ARPANET 1969. The IP header byte-by-byte. BGP and the planet-scale graph nobody owns. IPv4 → IPv6. Spoofing and hijacking.
CHAPTER 10
TCP — The Problem of Reliability
Reliability built on top of unreliable. The three-way handshake. Flow and congestion control as control theory. CUBIC, BBR, QUIC. SYN floods.
CHAPTER 11
HTTP, DNS, TLS — The Web
Berners-Lee 1989. What "a website" is at the protocol level. DNS as the internet's phone book. TLS handshake step-by-step. HTTP/2, HTTP/3, QUIC.
CHAPTER 12
JavaScript — The Language That Shouldn't Have Worked
Brendan Eich's ten days. The event loop. Node.js. The DOM. Web security: same-origin, XSS, CSRF, CSP.
Chapter 08

The
Wire

The voltage that flips a transistor inside a CPU is, at a different scale, the voltage that crosses a continent. We follow the signal out of the chip and into the world: copper, fiber, radio. The mathematics of how much information any of them can carry. The encodings that turn ones and zeros into wave shapes. The local-network protocols that share the substrate. And the layer model that lets every conversation about networking refer back to a single picture.

TopicsSubstrates · Shannon · encoding · Ethernet · OSI
Era covered1948 → present
Chapter 08 hero · The Wire CPU continent application transport network link · physical
01 — Bridge from Part I

The voltage on a transistor extends to the voltage on a continent.

Part I closed inside a CPU. Part III leaves the CPU. The same voltage transition that switched a transistor in Chapter 1 — a small step from one level to another, interpreted by everything downstream as "the bit changed" — now travels: across the motherboard, out a network interface, down a cable, across an ocean, into another machine on the other side of the planet. The physics is unchanged. What grows is the discipline required to keep the signal intact over that distance.

A wire, electrically, is just a conductor with two ends. Push voltage in at one; some attenuated, slightly delayed version of that voltage shows up at the other. The delay is a function of length and cable type — about 5 ns per metre on copper, slightly faster on fibre. The attenuation is a function of frequency, distance, and material. The noise added along the way is a function of everything else in the universe nearby: lightning, fluorescent lights, microwave ovens, other wires, cosmic rays. A one-metre USB cable inside a desk hides all of this. A two-thousand-kilometre undersea cable cannot.

The story of the network is, at its core, the engineering of scaling this single phenomenon. How do you keep a voltage signal recognisable after a kilometre of copper? After a kilometre of fibre? After a hundred kilometres of either? After a wireless link through a forest in the rain? The answers come from physics, mathematics, and a great deal of practitioner cleverness. The voltage on a wire is the substrate; everything else in the next five chapters is what we build on top of it.

Fig 8.1 — One signal, two scales
Fig 8.1 — One signal, two scales the same physics, separated by twelve orders of magnitude in distance INSIDE A CPU · ~5 nm transistor channel 3.3 V → next gate ≈ 1 ns per bit · 0/1 = 0V/3.3V UNDER THE OCEAN · ~5 000 km NYC London repeaters / amplifiers attenuated · noisy · still recoverable SAME PHENOMENON · DIFFERENT SCALE · ENGINEERING IS WHAT KEEPS THE BITS RECOGNISABLE

A bit travelling between two transistors inside a CPU and a bit travelling between New York and London are the same physical phenomenon — a voltage transition propagating along a conductor — observed at scales twelve orders of magnitude apart. The CPU version completes in roughly a nanosecond and is barely degraded. The transatlantic version takes about 28 milliseconds, passes through dozens of optical amplifiers, and arrives noticeably degraded — but still recognisable as the original bit pattern. Everything in this chapter is the engineering that makes the second version possible.

02 — Three substrates

Electrons in copper, photons in glass, waves through air.

All networking happens over one of three substrates. Copper carries electrons. Fibre carries photons. Wireless carries electromagnetic waves through space. They look very different. They all do the same thing: transport bits as variations in some physical quantity that someone at the other end can measure.

Copper is the original. A pair of conductors, twisted together to cancel out external electromagnetic interference (this is what "twisted pair" means in Cat 6 twisted pair). The signal is voltage between the two wires. Cheap to manufacture, easy to terminate, but attenuates fast — by about 30 dB per kilometre at the frequencies modern Ethernet uses, which limits a single copper run to roughly 100 metres before a switch or repeater is required. Almost every cable inside a house or office is copper.

Fibre is glass. The signal is light, pulsed at frequencies around 200 THz. The light is launched into a thin glass core surrounded by a cladding with a slightly lower refractive index, so total internal reflection traps the light inside the core all the way to the receiver. Attenuation is dramatically lower — modern fibres lose about 0.2 dB per kilometre, allowing tens to hundreds of kilometres between repeaters. Every undersea cable, every long-distance internet trunk, every data-centre backbone is fibre. The first transatlantic fibre cable, TAT-8, went into service in 1988 and carried 280 megabits per second — at the time, more than the combined capacity of every previous transatlantic cable in history.

Wireless is the absence of any medium at all — bits as variations in an electromagnetic wave propagating through space. Wi-Fi runs around 2.4 GHz and 5 GHz; cellular at 700 MHz to several GHz; satellite up to tens of GHz. The trade is fundamental: no wire to install, but every receiver in range hears every transmission, the medium is shared with every microwave oven and every Bluetooth device, and atmospheric absorption is a factor at the higher frequencies. The mathematics of how to share that single medium efficiently is the bulk of cellular and Wi-Fi design.

🛡️

The substrate is the first attack surface. Each of these three media leaks differently. Copper wires emit small amounts of electromagnetic radiation that a receiver nearby can demodulate back into the original signal — the basis of TEMPEST, a US classification programme dating to the 1960s for shielding sensitive equipment against precisely this. (TEMPEST attacks against unshielded VGA cables can be performed with off-the-shelf radios from across a room; against modern equipment, from across a parking lot.) Optical fibres are harder but not immune: bending a fibre slightly leaks a small fraction of its light through the cladding, and the Snowden documents in 2013 revealed that the NSA and GCHQ had been systematically tapping undersea fibre cables at landing stations for years. Wireless is hardest of all to secure: every transmission goes to every receiver in range. The whole story of Wi-Fi security — WEP (Wired Equivalent Privacy, 1999, broken within a year), WPA (2003, broken 2008), WPA2 (2004, broken 2017 by the KRACK attack), WPA3 (2018, still standing) — is the engineering response to the fact that on a radio link, the attacker is always already in the room.

Fig 8.2 — Three substrates, one job
Fig 8.2 — Three substrates, one job COPPER · ELECTRONS twisted pair ~30 dB/km · 100 m runs · cheap Cat 5/6 · houses · offices FIBRE · PHOTONS cladding (lower index) core (light bounces) ~0.2 dB/km · 100s of km · Tbps undersea · backbone · data centres WIRELESS · EM WAVES 2.4-60 GHz · shared medium · noisy Wi-Fi · cellular · satellite EVERY BIT YOU SEND TRAVELS OVER ONE OF THESE THREE · OFTEN ALL THREE IN SUCCESSION Wi-Fi → home router → fibre to ISP → undersea fibre → another ISP → another Wi-Fi at the other end

A request from your laptop to a server in Tokyo crosses all three substrates — Wi-Fi from laptop to router, copper or fibre from router to ISP, fibre across the Pacific, more fibre into the destination data centre, finally copper or fibre to the server's NIC. Each hop is a different combination of physical medium and encoding. The application layer never sees any of it; it sees a TCP socket, which sees an IP route, which sees a series of link-layer frames, which sees, finally, a stream of voltage transitions or photon pulses or radio symbols. The whole stack exists to abstract away which substrate is in use at any moment.

03 — Shannon's information theory

1948. Bell Labs again. The number behind every wire.

In July and October of 1948, Claude Shannon — a thirty-two-year-old mathematician at Bell Labs — published "A Mathematical Theory of Communication" in two parts of the Bell System Technical Journal. It is the founding document of information as a quantifiable thing. Before this paper, "amount of information" was a metaphor. After it, information had units (bits) and a formula (entropy). And Shannon proved a result that still defines every modern network: every channel has a maximum bit rate, you can send reliably below it, and you cannot send reliably above it.

Shannon himself is one of the strangest figures in twentieth-century science, and worth knowing. He grew up in Gaylord, Michigan, the son of a probate judge and a high-school principal. He arrived at MIT in 1936 as a graduate student and, in a master's thesis written the next year, did something that no one had thought to do: he showed that the ideas of George Boole's nineteenth-century algebra of logic could be implemented as electrical relay circuits, and that any logical proposition could therefore be computed by a network of switches. Howard Gardner later called this "possibly the most important master's thesis of the twentieth century." Shannon was twenty-one. The thesis is the bridge between Chapter 2 of this book and everything that followed it: it is the moment Boolean logic stopped being philosophy and started being engineering.

He stayed strange. At Bell Labs through the 1940s, where he wrote the information theory paper essentially on his own (the whole field sprang from one head, in one paper, more or less complete), he was known for riding a unicycle through the corridors while juggling. He built an electromechanical mouse named Theseus that could solve a maze and remember the solution — arguably the first artificial learning machine, 1950. He later built rocket-powered pogo sticks, flame-throwing trumpets, and a chess-playing automaton. The papers he wrote in his "spare time" included the first paper on computer chess, the first paper on cryptography as information theory (declassified 1949 — it had been classified during the war), and a proof that the Rubik's cube has a maximum solving distance bounded by some specific number. He worked alone, refused most academic politics, and was difficult to interview. He died in 2001, having lived to see the entire networked world he had given the mathematics to. There is no later figure as singular.

The key definition in his 1948 paper, called entropy after the thermodynamics quantity it resembles, measures how much "surprise" — how much information — is contained in a probability distribution. For a discrete source emitting symbols with probabilities p₁, p₂, …, pₙ, the entropy is:

Shannon entropy

H = −Σ pᵢ log₂(pᵢ)

Measured in bits per symbol. A fair coin (½, ½) has H = 1 bit per flip — maximum surprise. A coin that always lands heads (1, 0) has H = 0 — no information. A biased coin (¾, ¼) has H ≈ 0.81 bits — between the two. The deeper insight is that this number is also the average code length any compression scheme can achieve, asymptotically. Shannon proved you can compress a source down to its entropy, and no further. ZIP, gzip, JPEG, MP3, H.264, every modern compressor — all sit somewhere on Shannon's curve, fighting for the last bit.

Fig 8.3 — Entropy of a binary source as p varies
Fig 8.3 — Entropy of a binary source as p varies 0 0.25 0.5 0.75 1.0 probability of "1" — p 0 0.5 1.0 H (bits / symbol) max — fair coin, 1 bit ¾,¼ → 0.81 always 0 → 0 always 1 → 0

Entropy of a binary source as the probability of "1" varies from 0 to 1. At the extremes (always 0 or always 1) there is no surprise — every bit is predictable, so each symbol carries zero information. At p = ½ — a fair coin — every flip is maximally surprising and carries one full bit of information. The curve is symmetric: a 70% / 30% source carries the same entropy as a 30% / 70% source, since "uncertainty" doesn't care which side you bet on. This single curve is the lower bound on lossless compression for any binary source. Compress a fair coin and you cannot do better than one bit per flip. Compress a 90/10 source and the theoretical minimum is about 0.47 bits per flip — actually achievable by arithmetic coding to within a few percent.

Shannon's second result, the noisy channel coding theorem, put a number on what a wire can carry. Given a channel of bandwidth B hertz and signal-to-noise ratio SNR, the maximum reliable bit rate — the channel capacity — is:

Shannon–Hartley channel capacity

C = B · log₂(1 + SNR)

If you transmit at rate R < C, there exists a coding scheme that achieves arbitrarily low error rate. If R > C, there is no such scheme — errors are inevitable, no matter how clever the encoding. The proof is non-constructive: Shannon proved good codes exist, decades before anyone knew how to build them. Modern error-correcting codes (LDPC, Polar, Turbo) approach within a fraction of a dB of Shannon's limit. The race to close that gap is the story of half of late-twentieth-century communications engineering.

Fig 8.4 — Channel capacity as a function of SNR
Fig 8.4 — Channel capacity as a function of SNR 0 10 20 30 40 50 SNR (dB) 0 2 4 8 14 C / B (bits / Hz) noisy radio · ~3 b/Hz Wi-Fi 6 · ~6.5 b/Hz phone modem · ~10 b/Hz · 56k fibre · ~12-14 b/Hz C = B · log₂(1 + SNR) — above this curve, errors are inevitable

Shannon's channel capacity curve: the maximum number of bits per second per Hertz of bandwidth as a function of signal-to-noise ratio. A 56k phone modem operated at roughly 30 dB SNR over a 4 kHz channel; the math gives 4000 × 10 ≈ 40 kbps, close to what was achieved in practice. Modern Wi-Fi 6 at ~20 dB SNR over 80 MHz channels achieves several hundred Mbps. Coherent fibre at ~15 dB effective SNR over tens of GHz of bandwidth pushes terabits per second on a single strand. The curve says nothing about how to achieve these rates — only that they cannot be exceeded. Every modulation, coding, and equalisation technique invented since 1948 is, in the end, a different way to climb closer to this single curve.

04 — Encoding

How a one-or-zero rides on a continuous wave.

Knowing the channel can carry ten million bits per second tells you nothing about how to actually represent those bits as voltage, light, or radio. Encoding is the layer that maps abstract ones and zeros onto physical waveforms the receiver can decode. Different encodings exist because they trade different properties: clock recovery, bandwidth efficiency, robustness to noise, ease of synchronisation. None is optimal everywhere; each is right somewhere.

The most naive encoding is NRZ — Non-Return to Zero. High voltage means 1; low voltage means 0. The signal sits at one of two levels and only changes when a new bit arrives. It is dead simple. It also has a fatal weakness: a long run of identical bits looks like a flat line. The receiver, reading bits at some clock rate, has no way to know when one bit ends and the next begins. Drift even slightly out of sync and the entire stream is corrupted from there onward.

Manchester encoding solves the synchronisation problem by embedding the clock in the data. Every bit is split into two halves; a transition in the middle carries the bit value (high-to-low for one polarity, low-to-high for the other). There is a transition in every bit period, no matter what the data is. The receiver locks onto those transitions and stays in sync indefinitely. The cost is double bandwidth: every bit needs two half-bit slots. Original Ethernet (10 Mbps over coax) used Manchester encoding for exactly this reason — it could not afford to lose synchronisation on a long run of zeros.

Fig 8.5 — Manchester: every bit is a transition
Fig 8.5 — Manchester: every bit is a transition bit pattern: 1 0 1 1 0 0 1 Manchester: ↑ a transition in the middle of every bit period — clock recovery is automatic CONVENTION: HIGH→LOW = 1, LOW→HIGH = 0 — THE MID-BIT EDGE CARRIES THE VALUE

Manchester encoding represents each bit as a transition in the middle of a fixed time slot. A high-to-low transition is one value (often "1"); a low-to-high is the other. Because there is always a transition mid-bit, the receiver can lock its clock onto the data stream regardless of how many identical bits arrive in a row. The cost is that the signal alternates twice as fast as the underlying bit rate, requiring double the bandwidth — a 10 Mbps Ethernet stream actually carries 20 million signal transitions per second. This is why Ethernet over Manchester encoding maxed out at 10 Mbps over typical Cat 3 twisted pair; the next jump to 100 Mbps required moving to a more efficient encoding (4B5B over Cat 5) plus clever line coding.

Fig 8.6 — Same bits, two waveforms
Fig 8.6 — Same bits, two waveforms bits: 1 0 0 0 0 1 1 NRZ: ↑ four zeros in a row — flat line — receiver could lose count Manchester: ↑ a transition every bit period, even on long runs — clock stays locked NRZ — HALF THE BANDWIDTH BUT NEEDS A SEPARATE CLOCK · MANCHESTER — DOUBLE THE BANDWIDTH BUT SELF-CLOCKING

The same bit stream — 1 0 0 0 0 1 1 — encoded two ways. In NRZ, four consecutive zeros produce four identical time slots of low voltage; nothing in the signal tells the receiver where one zero ends and the next begins. If the receiver's clock drifts even slightly, the count is lost. In Manchester, every bit is a transition, so the receiver re-syncs its clock on every bit period regardless of the data. The trade is bandwidth: Manchester doubles the rate of signal changes per second. Modern high-speed encodings (8B/10B, 64B/66B) combine the bandwidth efficiency of NRZ with periodic forced transitions to maintain synchronisation — the best of both worlds, at the cost of a small overhead in encoded bits.

When the channel cannot carry baseband signals — radio, DSL, cable modems, optical fibre with multiple wavelengths — the bits ride on a carrier wave, a high-frequency sinusoid that the medium can transport. Modulation is the process of varying some property of the carrier in step with the data: amplitude (AM), frequency (FM), or phase (PSK). More elaborate schemes combine these — QAM (Quadrature Amplitude Modulation) varies amplitude and phase simultaneously, packing multiple bits per symbol.

Fig 8.7 — Three ways to ride bits on a carrier
Fig 8.7 — Three ways to ride bits on a carrier bits: 1 0 1 0 AM amplitude FM frequency PSK phase shift QAM (USED IN WI-FI, CABLE, LTE) COMBINES AMPLITUDE + PHASE — 6, 8, 10 BITS PER SYMBOL

Three ways to map digital bits onto a sinusoidal carrier. AM changes the amplitude — easy to see on an oscilloscope, vulnerable to noise (which mostly affects amplitude). FM changes the frequency — invented by Edwin Armstrong in 1933 specifically because it tolerates amplitude noise; this is why FM radio sounds clean during thunderstorms while AM does not. PSK changes the phase — robust, efficient, and the basis of every modern digital radio. Real-world systems compose these: QAM changes amplitude and phase together to pack 4, 6, 8, even 10 bits into a single symbol. Wi-Fi 6 uses 1024-QAM (10 bits per symbol) under good conditions, falling back to 64-QAM or QPSK as the channel degrades.

05 — Local networks

Metcalfe, 1973: many computers, one cable.

In 1973, Bob Metcalfe at Xerox PARC invented Ethernet as part of his doctoral work. The original idea was elegant in its plainness: any number of computers share one cable; they all listen; whenever no one else is talking, anyone can send; if two start at the same time and the signals collide, both detect it, both back off for a random interval, and both try again. Carrier Sense, Multiple Access, with Collision Detection — CSMA/CD. That single idea, wrapped in increasingly fast versions of itself, became the dominant local-area networking technology on the planet.

Metcalfe arrived at the idea by an unusual route. His Harvard PhD thesis on packet networking had been rejected — the committee felt it lacked theoretical depth — and the rejection sent him into a foul mood and onto a flight to Hawaii, where he had heard about a packet-radio network called ALOHAnet built by Norm Abramson at the University of Hawaii. ALOHAnet let the islands' campuses share one radio frequency by transmitting whenever they had data and retransmitting after random delays when collisions occurred. Metcalfe analysed the math on the plane home, realised Abramson had badly under-estimated the achievable utilization of the protocol with smarter timing, wrote the corrections into a revised thesis, and got his doctorate. He then took the same insight, improved with carrier sensing ("listen before transmit"), and applied it to a wire instead of radio. That was Ethernet. The thesis Harvard rejected became, with the addition of practical engineering, the protocol that connected most offices in the world.

The Xerox PARC of 1973 was, in retrospect, the densest concentration of computing talent ever assembled in one place. Down the hall from Metcalfe, Alan Kay's group was inventing the personal computer (the Alto, 1973 — the machine that taught Steve Jobs what a graphical interface was), Adele Goldberg and Dan Ingalls were building Smalltalk, Charles Simonyi was writing what would become Microsoft Word, and Butler Lampson was designing the laser printer. Metcalfe's job was to network these machines together. His original Ethernet sketch is one of the artefacts in the Computer History Museum: a hand-drawn diagram on a piece of yellow paper, the date "May 22, 1973" in the corner, the basic architecture of every wired LAN since.

The original 10BASE5 Ethernet ran at 10 megabits per second over a single thick coaxial cable up to 500 metres long — the "thicknet" of the late 1970s. By the 1990s, switches replaced the shared cable; each device got its own dedicated cable to the switch, and collisions stopped happening because there was no one else on the wire. The collision-detection machinery is now mostly vestigial. But every Ethernet frame still carries the same header structure invented for that shared cable — including the famous 48-bit MAC addresses every network card has.

"Networking is inter-galactic."

— Bob Metcalfe, on a whiteboard at Xerox PARC, 1973
Fig 8.8 — Two stations collide; both back off
Fig 8.8 — Two stations collide; both back off time → station A station B sending sending COLLISION JAM JAM A waits R_a B waits R_b ≠ R_a A retransmits — clear B retransmits truncated binary exponential backoff: wait = random(0, 2^k − 1) × slot — k doubles per collision, capped at 10

Two stations sense an idle wire and start sending almost simultaneously. The signals collide somewhere along the cable. Both stations detect the collision (the voltage on the wire goes higher than either alone could produce), both stop, both broadcast a brief JAM signal so any third party knows to discard the partial frame, and both pick a random delay before retrying. The randomness is critical — if both used the same delay, they would collide again immediately. Ethernet uses truncated binary exponential backoff: after the k-th consecutive collision, each station waits a random number of slot-times in the range [0, 2k−1]. The expected wait grows exponentially with congestion, so heavily-contended networks self-throttle. With switched Ethernet, this whole machinery is dormant — but it is still in every NIC's firmware, and it is what makes Wi-Fi (which still has shared-medium collisions) work.

Every Ethernet device has a MAC address — a 48-bit identifier burned into the network card at manufacture. The first 24 bits are the OUI (Organizationally Unique Identifier), assigned to the manufacturer by the IEEE; the next 24 are the device-specific portion. The MAC address tells you who built the network card. Apple devices begin 00:1A:11 and a few hundred other prefixes; Cisco's most common OUIs are 00:0A:41 and dozens more; Raspberry Pi devices begin B8:27:EB. There are public databases mapping every OUI to its owner.

MAC addresses identify devices on the local link. IP addresses, which we'll meet in Chapter 9, identify devices on the global network. ARP — Address Resolution Protocol — is the mechanism that bridges the two. When a host wants to send a packet to 192.168.1.5 on its own subnet, it broadcasts a question to every device on the link: "Who has 192.168.1.5? Tell me your MAC." The owner replies; the asker caches the answer for a few minutes; subsequent packets to that IP go straight to the matching MAC.

Fig 8.9 — A MAC address, and the ARP cache that resolves it
Fig 8.9 — A MAC address, and the ARP cache that resolves it a MAC address — 48 bits in two halves OUI · MANUFACTURER B8:27:EB Raspberry Pi Foundation · 24 bits DEVICE-SPECIFIC 9C:DA:C7 unique per card · 24 bits a host's ARP cache (after some "who has?" broadcasts) IP address MAC address expires in 192.168.1.1 8C:85:90:1F:33:42 3m 12s 192.168.1.5 B8:27:EB:9C:DA:C7 4m 47s 192.168.1.42 A4:83:E7:1C:8B:5E 2m 03s "WHO HAS 192.168.1.5?" → BROADCAST · OWNER REPLIES · ANSWER CACHED FOR ~5 MIN

A MAC address splits cleanly into manufacturer (OUI) and device-specific halves. The OUI B8:27:EB is owned by the Raspberry Pi Foundation, so this address is on a Pi. ARP keeps a small cache mapping known IP addresses on the local subnet to their MAC addresses. When a host wants to send to an IP not in the cache, it broadcasts an ARP request ("who has 192.168.1.5?"); the owner replies with a unicast ARP response. Entries expire after a few minutes so that disconnected devices don't accumulate. ARP has no authentication — anyone on the link can claim any IP, which is the basis of ARP spoofing (Chapter 15) — but on a properly switched modern LAN, the threat is contained because spoofing requires being on the local segment to begin with.

Fig 8.10 — Hub vs switch: broadcast vs learned
Fig 8.10 — Hub vs switch: broadcast vs learned hub — every frame to every port HUB A B C D A → B sends; B, C, D all hear it switch — frame to the right port only SWITCH A→p1, B→p2, C→p3, D→p4 A B C D A → B sends; only B receives SWITCH "LEARNS" WHICH MAC IS ON WHICH PORT BY OBSERVING TRAFFIC — A SELF-CONFIGURING TABLE

A hub is a dumb electrical repeater: every signal that arrives on any port is rebroadcast on every other port. Four hosts on a hub means every frame is heard by all four; collisions are routine; bandwidth is shared. A switch is intelligent: it observes the source MAC of every frame that arrives, learns which MAC sits on which port, and forwards each subsequent frame only to the port leading to its destination. Other ports stay quiet. Modern Ethernet is universally switched, with each cable a private link between one device and one port. The collision-detection machinery still exists in every NIC for compatibility, but on switched links it never fires. This single change — hub to switch, late 1990s — is what made Ethernet scale from megabits to gigabits to terabits per second.

🛡️

MAC flooding — when a switch forgets it is a switch. A switch's MAC table has finite size — usually a few thousand entries on consumer hardware, tens of thousands on enterprise gear. An attacker with access to one port can rapidly send frames with a million different fabricated source MACs. The switch dutifully records each one, fills the table to capacity, and falls back to what it does when it does not know which port a destination MAC sits on: broadcast to every port. The switch has been downgraded to a hub. Every frame on the LAN is now visible to the attacker. This attack — called MAC flooding or sometimes CAM-table overflow — was demonstrated by Mike Beekey in 2000 and is the reason serious networks deploy port security (sticky MAC, max-MACs-per-port) on every access switch. It is also a useful illustration of a recurring pattern in networking: the polite, table-based, learning-by-listening designs of the 1990s assumed every participant played fair, and an attacker who declined to play fair could turn cooperative infrastructure against its users.

06 — OSI & TCP/IP layer models

Why every textbook draws the same cake.

The reason network engineers can talk about network problems precisely is the layer model. Each layer does one thing, and only one thing. Each layer is a customer of the layer below and a provider for the layer above. Once you've drawn the layers, every protocol falls into exactly one slot, and every conversation about networking becomes "which layer is this happening at?"

The canonical reference is the OSI seven-layer model, standardised by the International Organization for Standardization in 1984. Bottom to top: Physical, Data Link, Network, Transport, Session, Presentation, Application. Each layer is named after the abstraction it provides. Physical moves bits over a medium. Data Link moves frames between adjacent nodes. Network moves packets across a global graph of nodes. Transport moves byte streams reliably or messages unreliably end to end. Session, Presentation, and Application are about what the bytes mean.

The TCP/IP four-layer model is the one that actually runs the internet. It collapses Physical and Data Link into a single "Link" layer (since the IP layer doesn't care which kind of link is below it), and collapses Session, Presentation, and Application into a single "Application" layer (since real applications usually handle their own session semantics). The result is shorter, more honest, and describes deployed reality better — but the OSI model remains the teaching reference because its separations are pedagogically cleaner.

Fig 8.11 — The OSI seven-layer cake
Fig 8.11 — The OSI seven-layer cake 7. Application HTTP · DNS · SSH · SMTP 6. Presentation TLS · gzip · MIME · UTF-8 5. Session RPC · NetBIOS · SOCKS 4. Transport TCP · UDP · QUIC 3. Network IP · ICMP · BGP · OSPF 2. Data Link Ethernet · Wi-Fi · ARP · MAC 1. Physical copper · fibre · radio · NRZ OSI 7-layer reference model · ISO 1984

Each OSI layer is named after the abstraction it provides to the layer above. Physical turns bits into voltage / light / radio. Data Link bundles bits into frames addressed by MAC and shipped across a single hop. Network bundles frames into packets that find their way across the global graph by IP address. Transport turns packets into reliable byte streams (TCP) or messages (UDP). Session manages connection setup and teardown. Presentation handles data formatting, encryption, compression. Application is where the real protocol lives — HTTP, SSH, DNS. In practice the top three layers blur together — most application protocols handle their own session and presentation logic — which is why the simpler TCP/IP four-layer model is what's actually deployed.

Fig 8.12 — OSI vs TCP/IP, side by side
Fig 8.12 — OSI vs TCP/IP, side by side OSI · the textbook 7. Application 6. Presentation 5. Session 4. Transport 3. Network 2. Data Link 1. Physical TCP/IP · what actually runs Application HTTP, DNS, TLS, SSH Transport Internet Link Ethernet + physical OSI: PEDAGOGICALLY CLEAN — TCP/IP: HONEST ABOUT WHAT REAL PROTOCOLS ACTUALLY DO

The seven OSI layers map to the four TCP/IP layers in a clear way. The bottom two (Physical + Data Link) collapse into TCP/IP's Link layer because, from IP's perspective, anything that delivers a frame to the next hop is good enough. The top three (Session + Presentation + Application) collapse into TCP/IP's Application layer because in practice every real application protocol handles its own session and formatting — HTTP, for instance, does its own connection management, its own content negotiation, its own compression. Use OSI when you need to be precise about which layer a feature belongs to. Use TCP/IP when you're describing what's actually on the wire.

The seam to Chapter 9

Chapter 8 has been about the physical and link layers — how a single bit makes it across one hop. Chapter 9 climbs to the network layer. Up to now, every diagram has assumed two devices on the same wire, or at most a few devices on the same Ethernet segment. The internet is something else: hundreds of millions of devices, on thousands of independently-administered networks, none of them aware of most of the others, and packets that need to find their way across that graph from any source to any destination. The idea that made this work was small, strange, and explicitly Cold War. It is the subject of the next chapter.

Chapter 09

Packets — How
The Internet
Decided to Work

The phone company spent a century building a network where every call got its own dedicated wire. ARPANET, in 1969, made a different bet: chop messages into pieces, label each piece with a destination, and let it find its own way through the graph. That single architectural decision — packets not circuits — made the modern internet possible. It also made it possible for a misconfigured router in Pakistan to take YouTube off the air for the whole world for two hours.

TopicsPacket switching · IP · BGP · IPv4/v6 · spoofing & hijacks
Era covered1964 → present
Chapter 09 hero · Packets — How The Internet Decided to Work SRC DST two packets · same source & destination · different paths
01 — The phone company's view

A century of dedicated wires.

Before the internet, the dominant model for carrying messages between two points across a continent was the telephone network — and the telephone network was, architecturally, a vast machine for setting up dedicated end-to-end paths between exactly two parties. When you placed a call, a sequence of mechanical or electronic switches in central offices found a free wire from your handset, through trunk lines, to the handset on the other end. That entire path was reserved for your call, moment by moment, until you hung up. The model is called circuit switching, and it had been the right answer to the question of how to carry a voice for almost exactly a hundred years.

Circuit switching has elegant properties. The path is established once, so the latency through the call is constant. There is no congestion mid call: the bandwidth is yours alone for the duration. But it scales poorly for what computers need to do. A computer-to-computer conversation is not a continuous voice signal; it is a burst of activity followed by a long silence, then another burst. Holding a dedicated circuit for a connection that is silent 99% of the time wastes 99% of the wire. And every new pair of correspondents needs its own end-to-end reservation, which means the network has to know in advance how many simultaneous connections to plan for.

The alternative was packet switching: chop every message into small numbered chunks, write the destination on each chunk, and dump them into the network. The network's only job is to get each chunk closer to its destination. Different chunks may take different paths, arrive out of order, get duplicated, or get lost — and the receiving end is responsible for reassembling and asking for retransmissions. The wires are shared by everyone. A burst from one sender uses bandwidth that a moment later is being used by someone else's burst. There are no reservations, no setup phase, no central authority that has to grant a circuit before traffic flows.

Fig 9.1 — One call, two architectures
Fig 9.1 — One call, two architectures circuit switching — one dedicated path, held for the duration A B ↑ entire path is reserved · no one else can use these wires while A↔B is up packet switching — chunks find their own paths through a shared graph A B ↑ three packets · three paths · same destination · arrive out of order, reassemble at B CIRCUITS WIN ON LATENCY GUARANTEES · PACKETS WIN ON UTILISATION AND SCALE

In circuit switching, the network reserves a fixed path from caller to callee for the entire call. The wires along that path carry only this conversation, even when nobody is speaking. In packet switching, the network is a shared graph; each chunk of message is labelled with its destination and routed independently. Different chunks may take different paths, may arrive out of order, may even get lost — and the receiver is responsible for putting things back together. The phone network ran on circuits for a hundred years and worked beautifully for voice. Computers are bursty and many — packets fit them better. The decision to build the early ARPANET on packets rather than circuits is the architectural choice that made the modern internet possible.

02 — ARPANET 1969

Four nodes. A Cold War. The internet's first heartbeat.

The first message on what became the internet was two letters long and broke the receiving computer. The date was 29 October 1969, the time 22:30 Pacific. Charley Kline, a graduate student in Leonard Kleinrock's lab at UCLA, sat at a terminal connected by a leased 50-kbps telephone line to the Stanford Research Institute, 560 kilometres up the coast. On the other end was Bill Duvall. Their goal was modest: log in remotely, type the word LOGIN, see if the host accepted it. Kline typed L. It arrived. He typed O. It arrived. He typed G — and the SRI machine, which had been trying to be helpful by autocompleting LOGIN when it saw the third letter, crashed under the surprise. The two letters that had crossed the line, LO, were the first packets ever sent on what would become the internet. They got the system back up an hour later, and the connection held for the rest of the evening. By December there were four nodes — UCLA, Stanford Research Institute, UC Santa Barbara, the University of Utah — and the network had a name: ARPANET.

The intellectual ground had been laid years earlier, by three people working independently. Paul Baran at the RAND Corporation, between 1960 and 1964, was working on the most pressing strategic problem of the age: how to keep the United States' command structure functioning during and after a Soviet nuclear strike. The Cuban Missile Crisis had just shown the world how close to thermonuclear war it was; the existing AT&T long-distance telephone network was a hierarchical tree of switching centres, and the loss of any few of those centres would have cut the country into pieces. Baran's RAND memoranda — eleven of them, eventually published as On Distributed Communications — argued that the answer was a different architecture entirely: a richly-connected mesh, no central authority, messages broken into "message blocks" each routed independently, survivable in the face of arbitrary node loss. The Air Force liked the idea; AT&T hated it (their position was, more or less, that you couldn't run a real network without circuits, and they refused to build one); the project stalled politically. Baran moved on. The papers sat in RAND's library waiting to be picked up by the right reader.

Donald Davies at the UK's National Physical Laboratory arrived at the same architecture independently in 1965, named the thing — "packet switching," his phrase — and built a small working network at NPL by 1969. Leonard Kleinrock at MIT, in his 1962 doctoral thesis, had developed the queueing-theoretic mathematics showing that statistical multiplexing on a packet-switched network would utilize bandwidth far more efficiently than circuit switching could ever achieve. Kleinrock then took a faculty position at UCLA. When Larry Roberts at ARPA — the US Defense Advanced Research Projects Agency — was tasked in 1966 with building the experimental network the agency wanted, Roberts read all three sets of work, and the design that emerged was unmistakably Baran's mesh, Davies's terminology, and Kleinrock's math, with BBN's engineering providing the IMP (Interface Message Processor) routers. Kline was Kleinrock's grad student. The "LO" message was, in some sense, Baran's idea finally going live, six years after the Cuban Missile Crisis that had motivated it.

Fig 9.2 — The first internet
Fig 9.2 — The first internet ARPANET in December 1969 — four nodes, three states (continental US, abstracted) UCLA Los Angeles UCSB Santa Barbara SRI Menlo Park Utah U of U "LO" how it grew 1969 · 4 nodes 1971 · 15 nodes — TENEX, AI labs 1973 · trans-Atlantic — UCL, NORSAR 1974 · TCP paper · Cerf & Kahn 1981 · ~300 hosts 1983 · TCP/IP cutover · ARPANET becomes part of "the internet" 1990 · ARPANET decommissioned "LO" — THE FIRST PACKETS · UCLA → SRI · 29 OCT 1969 · 22:30 PT

The original ARPANET: four nodes connected by leased 50 kbps lines, switched by Interface Message Processors (IMPs) — refrigerator-sized minicomputers from BBN that handled the packet routing. The first message went from UCLA to SRI; the receiving system crashed after the second character, but the packets had been delivered correctly. Within four years the network spanned the continent and crossed the Atlantic. In 1974 Vint Cerf and Bob Kahn published the paper that defined TCP/IP. In 1983 ARPANET completed its switch from the original NCP protocol to TCP/IP — the day the protocols of the modern internet became the actual protocols on the wire. By 1990 ARPANET was decommissioned, having long since been absorbed into the larger network it had spawned.

Fig 9.3 — Baran 1964: three architectural shapes
Fig 9.3 — Baran 1964: three architectural shapes Paul Baran's 1964 sketches — drawn for the US Air Force, looking for a network that could survive CENTRALIZED single point of failure — destroy the centre, network dies DECENTRALIZED multiple hubs — destroy any one, others continue DISTRIBUTED no hubs — destroy any subset, the rest reroute BARAN'S THIRD SHAPE — THE DISTRIBUTED MESH — WAS THE INTERNET'S TOPOLOGY ARGUMENT

Baran's 1964 RAND memoranda included these three sketches. A centralised network has one big switch through which all traffic flows; lose the centre and everything dies. A decentralised network has several regional hubs, each handling its area; lose any one hub and only that region is hurt. A distributed network has no hubs at all — every node connects to several neighbours, traffic finds its way through the mesh. Lose any subset of nodes and the rest reroute around the gaps. Baran's argument: only the third topology survives a nuclear strike. The argument generalised: it also survives router failures, fibre cuts, ISP bankruptcy, BGP misconfigurations, and every other quotidian way networks break. The internet is a Baran-shaped network.

03 — IP, the Internet Protocol

The packet, byte by byte.

A packet on the wire is a precisely-shaped structure of bytes. The first twenty bytes are the IP header — a fixed-format arrangement of fields that every router on the path reads to figure out where the packet should go next, whether to fragment it, when to drop it, and what the higher-layer protocol is. Everything after the header is payload — what the sender actually wanted to deliver. The header is tiny by modern standards, designed in 1981 when every byte cost real money to transmit. Half of it is essentially still the same.

Fig 9.4 — The IPv4 header, byte by byte
Fig 9.4 — The IPv4 header, byte by byte IPv4 packet header — 20 bytes, RFC 791, 1981 0 8 16 24 31 Ver 4 bits IHL 4 bits Type of Service 8 bits · DSCP/ECN Total Length 16 bits · header + payload, max 65 535 Identification 16 bits · for fragment reassembly Flags 3 bits Fragment Offset 13 bits · in 8-byte units TTL 8 bits · time to live Protocol 6=TCP, 17=UDP, 1=ICMP Header Checksum 16 bits · recomputed at every hop Source Address 32 bits · e.g. 192.0.2.42 — set by sender, NOT verified Destination Address 32 bits · the only field routers really care about payload — TCP segment, UDP datagram, ICMP message… opaque to IP — only the receiver looks at this all routers care about: TTL (decrement), Total Length (forward), Destination (which way?) EVERYTHING ELSE IS METADATA · THE SOURCE FIELD IS SET BY THE SENDER, NOT CHECKED — HENCE IP SPOOFING

The IPv4 header is 20 bytes, packed tight, dating to 1981. Reading byte-by-byte: Version (4 = IPv4), IHL (header length in 4-byte units, almost always 5), ToS (Type of Service — historically priority hints, now used for DSCP and ECN), Total Length (header + payload, up to 65 535 bytes). Then Identification + Flags + Fragment Offset for fragmentation reassembly. Then TTL (decremented at every router; if it reaches 0, the packet is dropped — this is what bounds traceroute), Protocol (6 = TCP, 17 = UDP, 1 = ICMP — what's in the payload), and Header Checksum. The last eight bytes are the source and destination IP addresses. The destination is the only field most routers care about. The source is set by the sender and never verified — which is why IP spoofing exists.

A packet larger than the maximum-transmission-unit (MTU) of a link cannot be sent in one piece. Ethernet's standard MTU is 1500 bytes; some tunnel configurations are smaller. When a 4 KB packet must traverse a 1500-byte link, the router (or, since IPv4, sometimes the original sender) splits it into fragments — each a complete IP packet with the same Identification field, but with the Fragment Offset indicating where in the original payload it sits, and a "More Fragments" flag in the Flags field. The receiving end reassembles. Fragmentation is generally avoided in modern networks; IPv6 forbids router-side fragmentation entirely.

Fig 9.5 — A 4 KB packet, fragmented onto a 1500-byte link
Fig 9.5 — A 4 KB packet, fragmented onto a 1500-byte link original packet — 4000 bytes: payload (4000 B), ID=42 link MTU = 1500 → split frag 1 · offset 0 · MF=1 bytes 0–1479 of payload frag 2 · offset 185 · MF=1 bytes 1480–2959 frag 3 · offset 370 · MF=0 bytes 2960–3999 (last) all three carry ID=42 · receiver buffers them, sorts by offset, stops when MF=0 arrives if any fragment is lost, the entire packet is lost — receiver cannot ask for one fragment alone IPV6 FORBIDS ROUTER FRAGMENTATION — SENDER MUST DISCOVER THE PATH MTU UPFRONT

Fragmentation. A 4000-byte packet enters a link whose MTU is 1500 bytes. The router (or sender) breaks it into three fragments; each is a complete IP packet with the same Identification field (42 here) and a Fragment Offset that says where in the original payload it sits. Offsets are in 8-byte units, so 1480 bytes of payload corresponds to offset 185. The "More Fragments" flag is set on every fragment except the last. The receiver buffers fragments by ID, sorts them by offset, and reassembles when it sees MF=0. If any fragment is lost in flight, the entire packet is lost — IP has no way to ask for just one fragment to be retransmitted. Modern systems avoid fragmentation by discovering the smallest MTU on the path (Path MTU Discovery) and sending packets that fit. IPv6 forbids router-side fragmentation entirely; only the original sender can fragment, and only after probing.

04 — BGP & routing

The graph that nobody owns.

The internet is not one network. It is tens of thousands of independent networks, each operated by a different organisation, none of them under common ownership or control, all glued together by a single protocol that lets them announce routes to one another. That protocol is BGP, the Border Gateway Protocol. It is twenty-five years old, runs on every backbone router on the planet, and was designed with essentially no security at all — a fact the internet still has not fully recovered from.

Each independent network is called an Autonomous System (AS) and gets a number — Cogent is AS 174, Google is AS 15169, your ISP has its own. ASes connect to each other at peering points; the connections form a graph where the nodes are ASes and the edges are direct relationships. BGP is how an AS tells its neighbours: "I can reach the following IP prefixes; here is the path of ASes the traffic will take." Each AS that hears the announcement adds itself to the path and re-announces to its own neighbours, propagating the route across the whole graph.

Fig 9.6 — BGP path-vector announcements
Fig 9.6 — BGP path-vector announcements how a route propagates across the AS graph AS 65001 198.51.0.0/16 AS 65010 transit AS 65020 transit AS 65030 transit AS 65099 listener advert: 198.51.0.0/16, [65001] advert: 198.51.0.0/16, [65001] [65010, 65001] OR [65020, 65001] [65030, 65010, 65001] EACH AS PREPENDS ITSELF TO THE PATH · SHORTER PATHS WIN BY DEFAULT · LOOPS ARE DETECTED IN THE LIST

AS 65001 announces the prefix 198.51.0.0/16 with path [65001] — "I am the origin." Two transit ASes hear the announcement and prepend themselves: AS 65010 announces [65010, 65001], AS 65020 announces [65020, 65001]. Both reach AS 65030, which now sees two paths to the same destination and picks the shorter one (or applies local policy — most BGP routing decisions are policy, not pure shortest-path). AS 65030 prepends itself and announces [65030, 65010, 65001] onward to AS 65099, which now knows it can reach the prefix in three AS hops. Loop detection is automatic: an AS that sees its own number in the path drops the announcement. This is BGP at its simplest. The protocol has a few hundred more pages, but the path-vector core is this.

Fig 9.7 — The internet's tier hierarchy
Fig 9.7 — The internet's tier hierarchy Tier 1 / Tier 2 / Tier 3 — the rough commercial layering of the internet TIER 1 global · settlement-free peering Cogent Lumen NTT Telia Tata TIER 2 regional · pays Tier 1 for transit national ISPs · CDNs · cloud providers TIER 3 local · pays Tier 2 for transit your ISP · your enterprise · your home router upstream of this ~10–20 TIER-1 ASES · A FEW HUNDRED TIER-2 · TENS OF THOUSANDS OF TIER-3

The commercial structure of the internet, simplified. Tier 1 is the small set of ASes that have settlement-free peering with each other and reach every other AS without paying anyone. There are roughly a dozen — Cogent, Lumen (formerly Level 3), NTT, Telia, Tata, Telecom Italia, GTT, and a few more. Tier 2 are large regional or national operators who peer with each other but pay one or more Tier-1s for traffic to the rest of the internet. Tier 3 is everyone else — the ISP your home is on, your university, your cloud provider's edge. The packet your laptop just sent climbs this hierarchy: from your home router up through your ISP (Tier 3 → Tier 2), maybe across one Tier 1 to another Tier 1, back down to a Tier 2 in the destination's region, into a Tier 3 ISP, and finally to the server. The graph is held together by BGP announcements and commercial contracts. There is no central authority.

05 — IPv4 vs IPv6

The thirty-year migration that still isn't done.

IPv4 addresses are 32 bits long. That is 4 294 967 296 possible addresses, which sounded like a lot in 1981 and runs out, in practice, well below the count because of subnet-padding and reserved ranges. The world ran out of fresh IPv4 allocations at IANA's central pool on 3 February 2011, and at the regional registries shortly after. The successor — IPv6, defined in 1995 — uses 128-bit addresses, 296 times as many. The migration to IPv6 has been underway for thirty years and is still incomplete; depending on how you count, somewhere between 35% and 45% of internet traffic is IPv6 in 2025.

Fig 9.8 — IPv4 ran out, then CGNAT papered over it
Fig 9.8 — IPv4 ran out, then CGNAT papered over it 1990 1995 2011 2015 2025 free IPv4 pool 3 Feb 2011 — IANA exhausted IPv6 deployment ~40% CGNAT — Carrier-Grade NAT thousands of customers behind one public IPv4 IPV4 NEVER REALLY RAN OUT — IT GOT MULTIPLEXED VIA NAT, AT THE COST OF END-TO-END CONNECTIVITY

The IANA central pool of IPv4 allocations exhausted in February 2011. The regional registries followed over the next several years. The world did not break — instead, ISPs adopted Carrier-Grade NAT (CGNAT), which shares one public IPv4 address among hundreds or thousands of customers behind a translation layer. CGNAT works for outgoing connections (web browsing, video streaming) but breaks any protocol that needs incoming connections to specific addresses (peer-to-peer, hosting servers from home, SIP). IPv6 deployment has grown steadily since — about 40% of traffic in 2025 — but full migration is still distant. The internet runs on a hybrid where most users are IPv6-capable but most servers are still dual-stack, and the translation between them is happening every nanosecond at every CGNAT box on the planet.

Fig 9.9 — The shape of an IPv4 vs an IPv6 address
Fig 9.9 — The shape of an IPv4 vs an IPv6 address 128 bits is enough to assign every grain of sand on Earth its own address IPv4 · 32 BITS · 4.3 billion 192.0.2.42 a 32-bit number, written as four decimal octets IPv6 · 128 BITS · 3.4 × 10³⁸ 2001:0db8:85a3:0000:0000:8a2e:0370:7334 eight 16-bit groups in hex, often abbreviated (consecutive zero groups → "::") 2¹²⁸ ≈ 3.4 × 10³⁸ — ENOUGH FOR EVERY ATOM IN THE EARTH'S CRUST IF YOU WERE GENEROUS

An IPv4 address is 32 bits, conventionally written as four decimal octets like 192.0.2.42. An IPv6 address is 128 bits, conventionally written as eight 16-bit groups in hexadecimal, separated by colons. IPv6's address space is so large — 2128 ≈ 3.4 × 1038 — that the engineering trade-off completely flips: addresses are no longer scarce, so the protocol can spend them lavishly. Every device gets a public address. Subnets are 64 bits wide on every link. Stateless autoconfiguration lets a device pick a globally unique address automatically. The price for the abundance is that every router, every operating system, every firewall, every networking library on Earth had to be modified — which is why thirty years in, the migration is still not done.

06 — Spoofing, hijacks, route leaks

Trust the source field at your peril.

Recall from Fig 9.4 that the IP source address is set by the sender and not verified anywhere along the path. There is no equivalent of a return address being checked against the postmark; the receiver simply sees whatever the sender wrote. This is IP spoofing, and it is built into the protocol. Most modern ISPs filter outgoing packets to stop their own customers from spoofing arbitrary sources (BCP 38 — best current practice from 2000), but enforcement is uneven. Spoofed packets still flow.

🛡️

The Morris worm — 2 November 1988 — the moment computer security became a discipline. Robert Tappan Morris, a twenty-three-year-old graduate student at Cornell whose father was the chief scientist at the NSA's National Computer Security Center, released a self-replicating program onto the ARPANET that exploited three different vulnerabilities at once: a buffer overflow in the UNIX fingerd service, a debug feature accidentally left on in sendmail, and weak passwords in rsh. Each instance of the worm, on landing on a new host, would replicate itself to nearby hosts. A bug in the worm — Morris had attempted to avoid re-infecting already-compromised machines, but the avoidance check had a one-in-seven false-negative rate by design — meant machines were re-infected dozens of times, until they collapsed under the load. Within hours, somewhere between five thousand and ten thousand machines (rough estimates suggest about ten percent of the entire internet of the time) were unusable. Morris was traced within days and became the first person ever convicted under the US Computer Fraud and Abuse Act of 1986. He received three years' probation, four hundred hours of community service, and a $10 050 fine. He is now a tenured professor at MIT. Within the same week of the worm's release, DARPA funded the creation of the Computer Emergency Response Team (CERT) at Carnegie Mellon — the first organisation specifically tasked with coordinating response to internet security incidents. Every national CERT in every country today traces directly to that decision. The protocols of Chapter 9 had been built in an era of institutional trust; the Morris worm was the ten-second proof that an era of institutional trust was over.

Fig 9.10 — IP spoofing: a forged source field
Fig 9.10 — IP spoofing: a forged source field the sender writes the source field — and nobody checks it attacker 203.0.113.7 "real" IP SRC: 198.51.100.5 (forged!) DST: 192.0.2.10 (target) target 192.0.2.10 reply (if any) goes to 198.51.100.5 — not to the attacker → no two-way conversation, BUT plenty of one-way attacks work fine → DNS amplification, SYN floods, NTP reflection — all use spoofed sources BCP 38 (2000) ASKS ISPS TO FILTER · ENFORCEMENT IS UNEVEN · SPOOFED TRAFFIC IS STILL ROUTINE

An attacker writes any source IP they want into the IP header and sends. Routers in between forward the packet based on the destination only. The target receives the packet with the forged source. This is fine for one-way attacks: a TCP SYN flood (Chapter 10) needs only to cause the target to allocate state for a half-open connection, never to receive the reply. DNS and NTP amplification attacks send small spoofed queries that elicit large responses — all directed at the spoofed victim, not the attacker. BCP 38 (2000) recommends every ISP filter outgoing packets so that customers cannot spoof addresses that are not theirs to use; a quarter-century later, deployment of BCP 38 is incomplete and spoofed-source attacks remain routine. The fix exists. The internet has not finished installing it.

A more dramatic class of attack exploits BGP itself. Recall: each AS announces which prefixes it can reach, and other ASes trust those announcements. There is, in the original protocol, no mechanism for the hearer to verify that the announcer actually owns the prefix it claims. An AS can announce anything and its neighbours will believe it. If the announcement is more specific than the legitimate one, the hijacker wins the global routing table. This is BGP hijacking, and the canonical example happened on 24 February 2008.

Fig 9.11 — Pakistan/YouTube, 24 February 2008
Fig 9.11 — Pakistan/YouTube, 24 February 2008 how a domestic block became a global outage in two minutes AS 36561 YouTube AS 17557 Pakistan Telecom AS 3491 PCCW (HK) ① 18:47 UTC — Pakistan Telecom announces 208.65.153.0/24 to block YouTube domestically ② their upstream PCCW (AS 3491) leaks the announcement to the global internet ③ /24 is more specific than YouTube's /22 → wins everywhere → YouTube globally offline ~2 hours

On 24 February 2008, Pakistan Telecom (AS 17557) decided to block YouTube domestically by announcing the prefix 208.65.153.0/24 to its own routers, redirecting traffic to a null route. By accident, the announcement was leaked upstream to PCCW (AS 3491), which propagated it to the rest of the internet. Because /24 is more specific than YouTube's legitimate /22, every BGP router on Earth preferred the more-specific route — and now sent YouTube's traffic to Pakistan, where it was being null-routed. YouTube went off the air globally for about two hours, until the announcement was withdrawn and the legitimate routes propagated back. There was no malice and no exploit; one local engineering decision, one configuration mistake, and a protocol with no source authentication produced a planet-scale outage. RPKI (Resource Public Key Infrastructure) and BGPsec are partial answers, deployed slowly. As of 2025, the underlying issue — that BGP trusts whatever an AS says — remains.

Fig 9.12 — A route leak: traffic takes the wrong door
Fig 9.12 — A route leak: traffic takes the wrong door a route leak detours traffic through an AS that should not be transit SRC DST normal path — direct, ~50 ms leak AS leaking customer route as if it were transit ↓ leaked path — packet takes the wrong door

A route leak is the gentler cousin of a hijack: an AS that has heard a route announcement re-announces it to a peer that should not have received it. Most commonly, an AS that buys transit from two providers accidentally re-announces routes from one provider to the other, becoming a free transit between them. Traffic that was supposed to take a direct, fast path now detours through the leaking AS — sometimes through entirely the wrong continent. There is no malice; just a misconfiguration in a router somewhere. Because BGP has no notion of "this announcement should not be propagated" baked into the protocol, leaks ripple. The IETF's RFC 7908 catalogues the standard leak types. The defences (route filters, the IRR database, RPKI ROAs, ASPA) all live in the same uncomfortable place: technically possible, partially deployed, defeated by the fact that the protocol fundamentally trusts what its peers say.

The seam to Chapter 10

Chapter 9 has built the layer that gets a packet from any source to any destination — best-effort, possibly out-of-order, possibly lost. Chapter 10 builds reliability on top of that unreliability. The protocol it describes — TCP — is older than the web, older than most of the people reading this, and still carries somewhere over half of all internet traffic, with the rest split between UDP-based applications and the new QUIC protocol that abandons TCP entirely. The mathematics that makes TCP work is control theory; the engineering that scales it is forty years of careful refinement. We will spend Chapter 10 inside it.

Chapter 10

TCP — The
Problem of
Reliability

IP delivers packets best-effort: lossy, out-of-order, sometimes duplicated. For voice and DNS that's fine — UDP rides directly on top and applications cope. For everything else — file transfer, web pages, long-lived connections — somebody has to make an unreliable network behave reliably. TCP is that somebody. Its mathematics is control theory; its engineering is forty years of careful refinement; its attack surface is the reason every server has a SYN flood story.

TopicsUDP · 3-way handshake · AIMD · CUBIC · BBR · QUIC · SYN flood
Era covered1974 → present
Chapter 10 hero · TCP — The Problem of Reliability client seq=X server seq=Y SYN SYN-ACK ACK three segments. one connection.
01 — UDP first

The simplest thing that could possibly work.

Before TCP, before the elaborate machinery of reliability, there is the simplest possible thing you can do with IP: just send packets. Don't number them. Don't track them. Don't ask whether they arrived. That is UDP — the User Datagram Protocol, RFC 768, 1980, three pages long. An eight-byte header on top of IP. No connection. No retransmission. No ordering. No back-pressure. And yet UDP carries an enormous fraction of modern internet traffic — DNS, every video call, every game, every voice chat, every QUIC connection — because for these applications, the simplicity is the feature.

The trade is explicit. UDP says: I will hand your bytes to the IP layer; whether they arrive, in what order, possibly duplicated, possibly lost, is your problem. Most applications find this terrifying. A few find it liberating. A live phone call cannot wait for retransmission — a packet that arrives 200 milliseconds late is worse than a packet that didn't arrive at all (the human ear notices a hiccup; it does not notice a 20-millisecond gap). A DNS lookup is a single small request and a single small response — TCP's three-segment setup overhead would double the lookup's latency to no benefit. For these workloads, UDP is not a downgrade from TCP; it is the right tool.

Fig 10.1 — UDP: eight bytes, no state, no apologies
Fig 10.1 — UDP: eight bytes, no state, no apologies UDP header — RFC 768, August 1980, three pages 0 16 31 Source Port 16 bits · sender's port (0 if unused) Destination Port 16 bits · receiving service (e.g. 53 = DNS) Length 16 bits · header + data Checksum 16 bits · optional in IPv4, mandatory in IPv6 payload — whatever the application wants opaque to UDP — DNS query, RTP audio frame, QUIC packet… what UDP does NOT do: — no connection setup — no retransmission of lost packets — no ordering, no flow control, no state where it wins: — DNS (single round trip · 53/udp) — voice / video (loss < latency) — games · QUIC · NTP · DHCP

UDP is essentially "an envelope around your bytes." Eight bytes of header — source port (where it came from), destination port (which service to deliver to), length, optional checksum — and then your payload. There is no concept of a "connection" between two UDP endpoints; each packet stands alone. There is no notion that a previous packet was sent, or that a future packet will be. For applications that need exactly that — fire-and-forget, real-time, single-shot — UDP is the right protocol. For applications that need a reliable byte stream, TCP picks up where UDP refuses to.

02 — Why TCP

Four engineering goals, written 1974, still in force.

In 1974, Vint Cerf and Bob Kahn published "A Protocol for Packet Network Intercommunication" in the IEEE Transactions on Communications. The paper described a protocol — they called it simply TCP — designed to do four specific things on top of an unreliable packet network: segment a long byte stream into manageable chunks, order them so the receiver sees them in the right sequence, ensure their reliable delivery despite arbitrary loss, and apply flow control so a fast sender does not overwhelm a slow receiver. Every piece of machinery TCP carries is in service of one of those four goals.

The architecture follows. Every byte of the application's stream is assigned a sequence number — a 32-bit counter, incremented for each byte, that lets either side say exactly which part of the stream a segment carries. The sender retains a buffer of sent-but-not-yet-acknowledged bytes; if no acknowledgement arrives within a timeout, it retransmits. The receiver maintains a buffer of out-of-order arrivals and reassembles in sequence-number order. Receivers periodically advertise a window — how many more bytes they have buffer space to accept — and senders never have more than that many in flight. Each of these is straightforward in isolation. Together, on a network that loses ten percent of its packets, they create the illusion of a reliable byte stream that loses nothing.

The cost of all this is state. Every TCP connection — client and server — keeps a kernel-level data structure called the TCB (Transmission Control Block) tracking sequence numbers, window sizes, retransmit timers, the path's measured round-trip time, the congestion-control state, and a few dozen other values. A modern Linux server holding a million TCP connections simultaneously holds a million TCBs in kernel memory, with all the bookkeeping that implies. UDP's strength is that it has no state. TCP's strength is that it has all the state that reliability requires.

03 — The three-way handshake

Three segments. Two synchronised counters. One conversation.

A TCP connection begins with a three-segment exchange. SYN, SYN-ACK, ACK. Three round-trip halves, ending with both sides knowing each other's starting sequence numbers, both sides having allocated a TCB, both sides ready to stream. Why three? Two would not let the server pick its own starting sequence. Four would be redundant. Three is exactly what you need so that each side has both proven the other is reachable and acknowledged the other's starting counter.

Fig 10.2 — The three-way handshake, in time
Fig 10.2 — The three-way handshake, in time CLIENT SERVER CLOSED LISTEN SYN, seq=X "I want to talk; my ISN is X" SYN_SENT SYN_RCVD SYN-ACK, seq=Y, ack=X+1 "my ISN is Y; I got yours" ACK, seq=X+1, ack=Y+1 "got yours; counters synced" ESTABLISHED ESTABLISHED 1.5 RTTS BEFORE THE FIRST APPLICATION BYTE — A PERMANENT TAX TCP CHARGES PER CONNECTION

Three segments cross the network. The SYN carries the client's chosen Initial Sequence Number (ISN), randomly selected for security. The SYN-ACK carries the server's ISN and acknowledges receipt of the client's ISN+1. The ACK closes the loop by acknowledging the server's ISN+1. After this, both sides know both sequence numbers and the connection is ESTABLISHED. The whole exchange takes 1.5 round trips before a single byte of application data can flow — a permanent tax that motivated TFO (TCP Fast Open) and, eventually, the design of QUIC, which folds connection setup into the encryption handshake to save round trips.

The state both endpoints traverse during connection setup, data transfer, and teardown is captured in TCP's finite state machine — a diagram every networking student commits to memory at some point. There are eleven states. Most connections move through them in a single well-trodden path: CLOSED → SYN_SENT → ESTABLISHED → FIN_WAIT_1 → FIN_WAIT_2 → TIME_WAIT → CLOSED. The off-path states exist to handle simultaneous opens, simultaneous closes, lost FINs, and the dozen other partial-failure modes the protocol was designed to survive.

Fig 10.3 — TCP's state machine
Fig 10.3 — TCP's state machine CLOSED LISTEN SYN_SENT SYN_RCVD ESTABLISHED FIN_WAIT_1 CLOSE_WAIT FIN_WAIT_2 LAST_ACK TIME_WAIT CLOSING listen() recv SYN recv ACK connect() recv SYN-ACK / send ACK close() recv FIN recv FIN close() recv ACK 2·MSL timeout SERVER PATH (RED) · CLIENT PATH (GREEN) · CLOSE PATH (GOLD) — 11 STATES, 1981 RFC 793

TCP's eleven-state finite state machine, the canonical version from RFC 793 (1981). The active opener (client) goes CLOSED → SYN_SENT → ESTABLISHED. The passive opener (server) goes CLOSED → LISTEN → SYN_RCVD → ESTABLISHED. Connection teardown is the four-state dance on the right and bottom: each side independently decides to close, sends a FIN, waits for the peer's ACK, then waits for the peer's FIN, then waits a final 2·MSL (Maximum Segment Lifetime, conventionally 60 seconds) in TIME_WAIT before fully closing — to ensure any straggler packets from the previous incarnation of this connection have died out before the four-tuple (src IP, src port, dst IP, dst port) can be reused.

04 — Flow and congestion control

How fast is too fast? Ask the network.

After the handshake, the real engineering begins. The sender must decide how fast to send. Too fast: the receiver drops packets, queues in the network overflow, and the connection wastes work and degrades. Too slow: bandwidth sits idle. The right speed changes from millisecond to millisecond as other flows enter and leave the network. TCP has no central authority telling it the answer; each connection must figure it out empirically, in real time, using only what it can observe — acknowledgements arriving (or not arriving) — as feedback. The result is one of the most beautiful pieces of distributed feedback control ever deployed.

The first piece is the sliding window. The receiver advertises, in every ACK it sends, the size of its receive buffer that remains free — the receive window. The sender promises never to have more than that many bytes in flight (sent but not yet acknowledged). As ACKs arrive, the trailing edge of the window advances; new bytes can be sent at the leading edge. The window slides forward through the byte stream as data is acknowledged. Receiver-side flow control ensures the sender never overwhelms the receiver's ability to consume.

Fig 10.4 — The sliding window
Fig 10.4 — The sliding window sender's view of the byte stream: 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 acknowledged in flight (unACKed) window space future / can't send yet ← window: 8 bytes → ACK 104 arrives → window slides right by 4: 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 ← window slides forward 4 bytes → RECEIVER ADVERTISES WINDOW IN EVERY ACK · SENDER NEVER EXCEEDS IT · NATURAL FLOW CONTROL

The sender's view of the byte stream is divided into four regions: bytes already acknowledged (safe to forget), bytes in flight (sent but unconfirmed — kept in retransmit buffer), bytes within the window (allowed to send but not yet sent), and the future (cannot send yet — outside the window). When ACK 104 arrives, the window's trailing edge advances, freeing the in-flight bytes and creating room at the leading edge for new bytes to enter the window. The window slides through the stream as ACKs arrive. The receiver controls the window size, so a slow consumer naturally throttles a fast producer — flow control with no extra messaging.

The sliding window prevents the sender from outrunning the receiver. It does nothing to prevent the sender from outrunning the network. That is what congestion control is for, and it is the deeper of the two problems. The story of how it entered TCP is worth telling in detail.

In October 1986, the bandwidth between Lawrence Berkeley National Laboratory and UC Berkeley — a single physical link, four hundred metres of cable — collapsed from its design capacity of 32 kilobits per second to roughly 40 bits per second. A factor of eight hundred. The collapse was sudden, the path was lightly loaded, no hardware had failed, and no one knew why. The two endpoints assumed they could simply send at their advertised window size; the network's queues filled; packets were dropped; TCP retransmitted; the retransmits piled into the same queues; the queues collapsed again. Each retransmit produced more retransmits. The connection's effective throughput fell by three orders of magnitude while neither side was, locally, doing anything wrong. Van Jacobson and Sally Floyd, both at LBL, spent the better part of the next two years investigating what was happening. Their observation — published in 1988 as "Congestion Avoidance and Control" — was one of the great results of computer networking: TCP's existing flow-control machinery (the receive window) accounted for the receiver but said nothing about the network in between, and a network without explicit congestion feedback would always, eventually, congest itself to death. The paper's fix added two pieces — slow start and AIMD — that became part of every TCP stack in the world over the next four years. Without that fix, the internet of the 1990s would not have existed; every link of significant utilization would have spent its life collapsing. The 1986 incident is one of those quiet moments where computer science had to grow a new sub-discipline overnight, in response to a phenomenon nobody had seen before. The sub-discipline it grew is the one this section describes.

The algorithm has two phases. Slow start begins cautiously — a tiny congestion window (cwnd) of one segment — and doubles on every successful round trip. Exponential growth probes the network's capacity quickly. When packet loss is detected (a timeout, or three duplicate ACKs), the connection assumes it has hit the network's capacity and switches to congestion avoidance: cwnd is halved, then increases by one segment per RTT — linearly. Slow probing replaces fast probing. Another loss triggers another halving. The window oscillates around the true capacity in a sawtooth pattern.

Fig 10.5 — The TCP sawtooth
Fig 10.5 — The TCP sawtooth cwnd (segments) 0 10 20 30 time (RTTs) slow start cwnd × 2 / RTT loss → ½ congestion avoidance — cwnd + 1 / RTT — until next loss true capacity (unknown to TCP)

Slow start (green): exponential — cwnd doubles every round trip until the first loss. Congestion avoidance (red): additive — cwnd grows by one segment per RTT, until loss. Each loss halves the window. Repeat. The window oscillates around the network's true capacity in this sawtooth shape, sometimes called the AIMD curve. TCP never knows the capacity; it discovers it by overshooting and being punished for the overshoot, then backing off and probing slowly upward. The whole machinery runs at every flow simultaneously, in distributed harmony. Loss is the only feedback signal, and loss is the universal language every router in between speaks (by dropping packets when its queues fill).

Why AIMD — Chiu & Jain, 1989

The genius of AIMD — Additive Increase, Multiplicative Decrease — is not just that it works, but that it is the only combination that works. Chiu and Jain proved in 1989 that, of the four possible combinations of additive/multiplicative increase paired with additive/multiplicative decrease, only AIMD converges to both efficiency (the network is fully utilised) and fairness (every flow gets an equal share). MIMD oscillates without converging. AIAD reduces too slowly under congestion. MIAD diverges. AIMD is the unique stable point. It is one of those rare results in engineering where the right answer is forced by the mathematics, not chosen by taste.

Fig 10.6 — AIMD's fairness, in phase space
Fig 10.6 — AIMD's fairness, in phase space flow A's allocated bandwidth → flow B's bandwidth → A + B = C capacity A = B (fair) optimal: efficient AND fair start: A > B add → up the 45° diagonal multiplicative ½ → toward origin moves closer to fairness line why AIMD converges — Chiu & Jain 1989

A two-flow phase plane (also called the Chiu–Jain plot). The horizontal axis is flow A's bandwidth allocation; the vertical, flow B's. Points above the red line exceed total capacity (someone loses). Points on the green line are fair (A = B). The optimal target is the intersection: full capacity, equal share. AIMD's geometry forces convergence: additive increase moves both flows up the 45° diagonal (so the relative gap stays the same in absolute terms but shrinks as a proportion). Multiplicative decrease on loss moves both flows toward the origin along a ray (preserving the ratio A:B, which means the absolute gap shrinks). Each cycle moves the system closer to the fair-and-efficient point. No central coordination. Just two flows, both running AIMD locally, both converging to fairness automatically. Among the most elegant distributed-systems results in computing.

05 — CUBIC, BBR, QUIC

Three answers to TCP's mid-life crisis.

Classic TCP — Tahoe, Reno, NewReno — was designed for the late-1980s internet, where round-trip times were tens of milliseconds and bandwidths a few megabits. The modern internet has flows that span continents (200 ms RTT), gigabit endpoints, satellite links with 700 ms RTTs, and mobile connections whose radio conditions change every second. Classic TCP loses badly under these conditions; the slow linear growth phase takes too long to fill a fat-pipe long-RTT link, and the loss-as-signal model fails on lossy wireless links where most loss has nothing to do with congestion. The last two decades produced three answers — each more radical than the last.

CUBIC (2008, Linux's default) keeps loss-based detection but replaces the linear additive-increase phase with a cubic function. After a loss, the window grows slowly at first, then faster, then slows again as it approaches the previous maximum. The shape is tuned to fill a long-fat pipe quickly without overshooting. Modern Linux defaults to CUBIC; it is what the great majority of internet TCP traffic actually runs.

Fig 10.7 — CUBIC vs Reno: same loss, different recovery
Fig 10.7 — CUBIC vs Reno: same loss, different recovery time after loss event (RTTs) cwnd loss · cwnd halved Reno — linear · slow on fat pipes CUBIC — slow near old peak, then aggressive probe upward previous peak (W_max)

After the same loss event, Reno's window grows linearly (one segment per RTT) — fine on the slow links of 1988, terrible on a 100 ms / 1 Gbps link where it takes thousands of RTTs to refill. CUBIC's window growth is a cubic polynomial centred on the previous loss-causing window: it grows slowly near that previous peak (cautiously revisiting where it failed), then accelerates above it (probing for new capacity). The shape is tuned by the cubic constant and the saved W_max. CUBIC fills modern long-fat-pipe links in a few seconds where Reno would take minutes. It has been Linux's default since kernel 2.6.19 (2006).

BBR (2016, Google) is more radical: forget loss as a signal. Instead, measure the network directly — bandwidth and round trip time — and aim for the bandwidth-delay product (BDP), the precise amount of in-flight data that fills the pipe without overflowing buffers. BBR is what Google deploys at its edge for YouTube and Search. In challenging environments (lossy Wi-Fi, transcontinental links, buffer-bloated home routers), BBR can be two to ten times faster than CUBIC. It is also more aggressive against competing CUBIC flows, which remains a source of debate.

Fig 10.8 — BBR aims for the bandwidth-delay product
Fig 10.8 — BBR aims for the bandwidth-delay product "the pipe" — bandwidth × RTT = in-flight bytes that fill it exactly RTT (one-way time × 2) ≈ 100 ms bandwidth · 1 Gbps BDP = bandwidth × RTT = 1 Gbps × 100 ms = 12.5 MB of in-flight bytes — exactly the right amount BBR measures both directly · keeps cwnd at this value · avoids buffer-bloat entirely

A network "pipe" between two endpoints has two dimensions: bandwidth (how many bytes per second can flow) and round-trip time (how long the bytes stay in transit). The product, BDP = bandwidth × RTT, is the amount of in-flight data that fills the pipe exactly. Less than BDP and the pipe is partly empty (under-utilisation). More than BDP and the excess piles up in some router's queue, increasing latency and eventually causing loss (this is "buffer bloat"). BBR — Bottleneck Bandwidth and Round-trip propagation time — measures both quantities continuously and tries to keep cwnd at exactly BDP. It is loss-agnostic: a packet lost to wireless interference is not interpreted as a congestion signal, only as a packet to retransmit. Google deploys BBR at the edges of its own infrastructure; YouTube playback in the developing world is meaningfully smoother because of it.

QUIC (Google 2012, IETF standard 2021) is the most radical answer of all: abandon TCP entirely. QUIC runs over UDP and builds reliability, encryption, multiplexing, and congestion control all in one combined protocol that lives in user space, not the kernel. Connection setup folds into the TLS 1.3 handshake — zero-RTT setup is possible for repeat connections. Loss in one HTTP request stream doesn't block other streams (the "head-of-line blocking" that hampered HTTP/2 over TCP). QUIC is what HTTP/3 runs on. By 2024 it carried somewhere over 25% of internet traffic, and the share is growing.

Fig 10.9 — TCP+TLS+HTTP vs QUIC: the same job, two stacks
Fig 10.9 — TCP+TLS+HTTP vs QUIC: the same job, two stacks classic — three layers, three handshakes HTTP/1.1 or HTTP/2 request multiplexing in HTTP/2 TLS 1.2 / 1.3 encryption · auth · 1–2 RTT setup TCP reliability · ordering · congestion · 1 RTT setup IP routing · best-effort delivery TCP setup + TLS setup = 2–3 RTTs before first byte QUIC — TLS, multiplexing, transport in one HTTP/3 stream API on top of QUIC QUIC reliability + ordering + congestion control + TLS 1.3 + multiplexed streams + 0-RTT resumption · in user space connection ID survives IP changes UDP just packets · no state · no setup first request can ship in 1 RTT, or 0-RTT on resume QUIC moves transport into user space, where it can iterate as fast as the application HTTP/3 = HTTP/2 SEMANTICS · QUIC TRANSPORT · UDP PACKETS · HARDWIRED TLS

QUIC collapses three classical layers (TCP, TLS, HTTP/2) into one user-space protocol over UDP. The benefits are concrete: 0-RTT resumption (a returning client can ship its first request alongside the handshake), per-stream multiplexing (loss in one HTTP request doesn't block the others, fixing HTTP/2-over-TCP head-of-line blocking), connection migration (the connection ID is application-level, so a phone switching from Wi-Fi to cellular keeps its TCP connection alive — TCP could not), and iterability (because QUIC is in user space, both endpoints can deploy new versions without needing OS kernel updates). The trade is that QUIC reimplements all of TCP's reliability machinery in user space, including congestion control — Linux kernel TCP is still faster per packet because it's been hand-optimised for thirty-five years. As of 2025, all major browsers, all major CDNs, and Google's whole edge run HTTP/3 over QUIC.

06 — SYN floods, cookies, and the rest

The price of state.

TCP's strength is that the kernel remembers every connection. TCP's weakness is the same: an attacker who can make the kernel allocate state without ever completing a connection can, with very little effort, exhaust the server's connection tables and lock out legitimate users. The classical attack — the SYN flood — is more than thirty years old and still works against unprotected servers.

The attack uses one feature of the three-way handshake: the server allocates a TCB the moment it receives a SYN, well before the third ACK arrives. An attacker sends thousands of SYN packets per second, each with a different (often spoofed) source IP. The server replies with SYN-ACK to each spoofed source — which goes nowhere or to an uninvolved third party — and waits for the third ACK that never arrives. Each half-open connection consumes a TCB slot for tens of seconds (the SYN_RCVD timeout). With a sufficient SYN rate, the server's connection table fills and legitimate clients cannot connect.

Fig 10.10 — SYN flood: half-open connections fill the table
Fig 10.10 — SYN flood: half-open connections fill the table attacker spoofed sources 10 000 SYN/sec target's connection table (SYN_RCVD slots) half-open from 198.51.100.7 half-open from 203.0.113.3 half-open from 192.0.2.45 half-open from 198.51.100.99 half-open from 203.0.113.41 half-open from 198.51.100.5 half-open from 192.0.2.11 half-open from 203.0.113.78 TABLE FULL legitimate clients refused EACH SLOT HELD ~30S WAITING FOR THIRD ACK · ATTACKER SUSTAINS THE FLOOD INDEFINITELY

A flood of SYN packets, each with a different (often spoofed) source IP, arrives at the target. The target's kernel allocates a TCB for each one and parks it in SYN_RCVD waiting for the third ACK. Because the source addresses are spoofed, the SYN-ACK responses go to addresses that didn't ask for anything; no third ACK ever comes. Each half-open entry sits in the table until the SYN_RCVD timeout fires (typically tens of seconds). At a few thousand SYN packets per second — easy for a single home connection to generate — the table fills. New legitimate connections are refused. The classical attack from the 1990s; still works against unprotected servers; the reason every modern OS ships with SYN cookie support and the reason cloud DDoS protection (Cloudflare, AWS Shield) exists.

The defence — invented by Daniel J. Bernstein in 1996 — is SYN cookies. The trick: when the connection table is under pressure, the server stops allocating state for incoming SYNs at all. Instead, it encodes the necessary state into the Initial Sequence Number it returns in the SYN-ACK. The encoding is a keyed hash of the (source IP, source port, destination IP, destination port, time bucket) — small enough to fit in 32 bits, hard to forge without knowing the server's secret key. If the attacker's third ACK ever arrives (it won't, because the source was spoofed), the server can verify the cookie, reconstruct the connection state from scratch, and proceed. If the legitimate client's third ACK arrives, same thing. The server kept no per-connection state during the flood.

Fig 10.11 — SYN cookies: the state goes into the wire
Fig 10.11 — SYN cookies: the state goes into the wire when the SYN backlog gets full, the server stops allocating state — and lets the client carry it ① SYN arrives src=A, port=p, seq=X ② compute cookie cookie = hash(secret, A, p, … + time bucket + MSS hint) ③ SYN-ACK with seq = cookie no TCB allocated if 3rd ACK ever comes back… ④ ACK arrives, ack=cookie+1 recompute hash, compare if matches → reconstruct state ⑤ now allocate the TCB connection becomes ESTABLISHED all state derived from the cookie no per-connection memory was used during the flood — only verified ACKs cause state to be created Daniel J. Bernstein, 1996 · enabled by default in modern Linux when the SYN backlog overflows

Under SYN-flood pressure, the server skips the TCB allocation entirely. It computes a cryptographic cookie — a keyed hash of the connection's identifying tuple plus a time bucket plus a few bits encoding the MSS hint the server would normally have remembered. The cookie is returned as the server's Initial Sequence Number. Now the state is in the wire, not in the kernel. If a real client completes the handshake, its ACK number will be cookie+1; the server recomputes the hash from the ACK's tuple, verifies it matches, and only then allocates the TCB. An attacker cannot forge cookies (the secret is unknown) and cannot benefit from spoofed sources (the matching ACK never comes back). The server keeps zero per-connection state during the flood — and legitimate connections still work. Daniel J. Bernstein, 1996. One of the most beautiful pieces of security engineering ever shipped.

Beyond the SYN flood, two other classical TCP attacks deserve note. RST injection — an attacker on the path injects a TCP RST segment with a sequence number in the receiver's window, causing the receiver to tear down the connection. The Great Firewall of China uses this for censorship: when the firewall sees a request it dislikes, it sends RST segments to both ends, killing the connection and blaming the network. TCP hijacking — when an attacker can predict the sequence numbers of an existing connection, they can inject data into it pretending to be one of the endpoints. The attack had been described academically by Robert T. Morris (later of the worm) in a 1985 Bell Labs technical report, but it was considered theoretical for a decade after — until Kevin Mitnick, on Christmas Day 1994, used it on the live internet for the highest-profile intrusion of the era.

Mitnick's target was Tsutomu Shimomura, a security researcher at the San Diego Supercomputer Center known for his work on cellular phone security. Shimomura's home network ran on trusted rsh — a UNIX command that, on the assumption that the source IP could not be forged, granted login access without a password to authorised hosts. The trick: but the source IP could be forged, if you also knew TCP's sequence numbers, and Mitnick had spent weeks fingerprinting Shimomura's machines' not-quite-random ISN generator. On Christmas, Mitnick mounted a multi-step attack: SYN-flooded a trusted host on Shimomura's network to take it out of the conversation, then opened a TCP connection to Shimomura's main workstation while spoofing the source IP of the now-incapacitated trusted host. Predicting the ISN, Mitnick blind-sent the third ACK plus an immediate echo + + >> /.rhosts command — adding a wildcard entry to the trust file that would let him log in normally afterward. The attack worked. Shimomura discovered the intrusion two days later, traced Mitnick's activity across multiple sites (some of it through cellular networks Shimomura himself had earlier helped secure), and led the FBI to Mitnick's apartment in Raleigh, North Carolina, on 15 February 1995. The case made the front page of The New York Times; a book (Takedown) and a film (Markoff and Hafner's Cyberpunk) followed. The technical lessons that came out of it — randomise ISNs, do not trust source IPs for authentication, authenticate connections cryptographically rather than by network identity — are the foundation on which TLS, the subject of Chapter 11, would be built.

The random ISN requirement was retrofitted into every operating system within a year of Mitnick's arrest. Modern stacks generate ISNs cryptographically (RFC 6528, 2012) so that sequence numbers cannot be predicted even with full knowledge of previous connections. Sequence-number-based hijacking, which had been one of the canonical TCP attacks for fifteen years, became mostly historical. The deeper conclusion — that the network can never authenticate who you are, only what you say, and so applications must authenticate cryptographically on top — is the conclusion that produced TLS, certificates, and the entire modern HTTPS web.

🛡️

The recurring pattern. TCP gives you reliability, ordering, and back-pressure — but each strength is built from kernel-side state, and that state is the attack surface. The defences are nearly all about making the protocol stateless under attack: SYN cookies (state in the wire), random ISNs (state in the unguessable), TLS underneath (state cryptographically authenticated). Every layer of the network stack, from here onward, exhibits the same arc: protocol invented for a reason; protocol exploited; protocol patched in a way that pushes state into harder places. Chapter 11's TLS handshake will follow the same shape, only with mathematics, not just hashes, doing the cryptographic heavy lifting.

The seam to Chapter 11

Chapter 10 has built a reliable byte stream on top of an unreliable packet network. Chapter 11 puts the actual web on top: HTTP for fetching documents, DNS for finding servers by name, TLS for keeping every byte of the conversation private and authenticated. The stack is now: voltage on copper (Chapter 8) → IP packet (Chapter 9) → reliable TCP byte stream (Chapter 10) → HTTP request and TLS encryption (Chapter 11). Two chapters from now the stack is complete and we'll have followed every byte of an HTTPS request from voltage to JSON.

Chapter 11

HTTP · DNS · TLS
The Web

The internet is not the web. The internet is the packet network of Chapters 8 through 10 — voltage on copper, IP datagrams, reliable TCP streams. The web is the layer above it: three protocols invented in 1989 at a particle physics lab, plus a handful of cryptographic primitives invented in the 1970s by mathematicians, that together turn a network of packets into a worldwide library of trustworthy documents. By the end of this chapter you will be able to read every byte of an HTTPS request from voltage to JSON.

TopicsHTTP · DNS · TLS · CA chain · HTTP/2 · QUIC
Era covered1989 → present
Chapter 11 hero · HTTP · DNS · TLS The Web IP · best-effort packets TCP · reliable byte stream TLS · encrypted, authenticated GET / HTTP/1.1 every byte verified by mathematics
01 — Berners-Lee 1989

A proposal that nobody asked for.

In March 1989, a 33-year-old British physicist working at CERN — the European nuclear research lab outside Geneva — submitted a 17-page document to his manager titled Information Management: A Proposal. His name was Tim Berners-Lee. The document described a system for linking documents across computers using any network — extending an idea already implemented in CERN's internal database. His manager, Mike Sendall, wrote in the margin of the cover page: "Vague but exciting." He approved the project, partly to give Berners-Lee something to do. Eighteen months later the world's first web server was running on a black NeXT workstation in Building 31 at CERN.

The setting matters. CERN in 1989 was the world's largest particle physics laboratory: thousands of scientists from dozens of countries collaborating on experiments that took years to plan and produced terabytes of data. The work was inherently distributed — a detector built in Italy ran software written in France against data analysed in Sweden, all reporting to a paper in Physical Review with co-authors in twelve countries. Documents lived everywhere: research papers on FTP servers, calibration tables in flat files on shared disks, equipment manuals on a secretary's PC, design notes in a homegrown CERN database called ENQUIRE. Finding anything required knowing where to look. Sharing anything new required emailing it as an attachment to people you guessed might want it. The information existed; following it from one document to another required a human acting as the link.

Berners-Lee's proposal was, in essence, that the link itself should be part of the document. A reference in one document should point at another document on another computer — and a click on that reference should fetch and display the target. This is the idea of hypertext, and it was not new: Vannevar Bush had described something like it in 1945 ("As We May Think," The Atlantic); Ted Nelson had named it in 1965 and spent three decades trying to build a perfect version (Project Xanadu, which never shipped). What Berners-Lee added was the recognition that hypertext's bottleneck was no longer the idea — it was the plumbing. TCP/IP was widespread by 1989. UNIX workstations were common. The hard problem was just defining a few simple conventions — how documents are addressed, how they are requested, how they are formatted — and then implementing them.

He defined three:

  • URL — Uniform Resource Locator. A way to write down where any document lives, in any system. http://info.cern.ch/hypertext/WWW/TheProject.html — protocol, host, path. The address-on-an-envelope of the web.
  • HTTP — HyperText Transfer Protocol. The procedure for one machine to ask another for a document at a URL and receive it back. Originally five lines of text on the wire.
  • HTML — HyperText Markup Language. A simple format for documents that includes links to other documents as inline elements rather than appendices.

Each by itself was uninteresting. Together they made the web. By December 1990 he had implemented all three on a NeXT workstation in his office at CERN. The server was a program called httpd; the client was a program he called WorldWideWeb — both browser and editor in one window. The first page he served was a description of the project, at the address info.cern.ch. (The URL still resolves; the original page was reconstructed in 2013 and is still online.) On 6 August 1991 he posted to the Usenet newsgroup alt.hypertext a short message announcing the existence of the World Wide Web project and inviting other implementations. That message is the public birth of the web.

Fig 11.1 — March 1989 to August 1991 · a quiet two and a half years
Fig 11.1 — March 1989 to August 1991 · a quiet two and a half years CERN Building 31, third floor — one developer, one workstation, three protocols THE MACHINE NeXT NeXT cube 25 MHz · 8 MB RAM "This machine is a server" — do not power off THE FIRST WEBPAGE info.cern.ch/hypertext/ WWW/TheProject.html World Wide Web The WorldWideWeb (W3) is a wide- area hypermedia information retrieval initiative aiming to give universal access to a large universe of documents. See also: What's out there? [underlined links to other docs] Mar 1989 proposal "vague but exciting" Oct 1990 code begins URL · HTTP · HTML Dec 1990 first server live info.cern.ch Aug 1991 Usenet announce the web is public

The whole web in 1991 was a NeXT cube in Berners-Lee's office, a server program he had written, a browser-editor he had also written, and a few HTML files describing the project. The machine had a sticky note on its case: "This machine is a server. DO NOT POWER IT DOWN!!" The first publicly announced URL was info.cern.ch/hypertext/WWW/TheProject.html. By the time Berners-Lee posted to alt.hypertext on 6 August 1991, the protocols and the content had existed for nine months and one user had been using them. Two years later there were five hundred web servers. Five years later it was unstoppable.

"The Web is more a social creation than a technical one. I designed it for a social effect — to help people work together — and not as a technical toy."

— Tim Berners-Lee, Weaving the Web (1999)

Crucially, none of the three protocols required anyone else's permission. HTTP runs on top of TCP, which runs on top of IP, which runs on the existing physical internet. URLs needed no central registry — any owner of any host could just start serving documents under their own domain. HTML was just text; anyone could write it. There was no Web Consortium yet. There was no licence. There was no fee. The web spread because the cost of joining was zero and the value of having joined was that you could now find things that other people had decided to publish. Within five years of that Usenet post the number of web servers was doubling every four months. CERN had built a worldwide library by accident, while looking for a better way to organise lab notes.

The architectural choice that mattered

One decision separates the web from the dozen earlier hypertext systems that never caught on. Berners-Lee allowed dangling links. In Project Xanadu, every link required the target document to formally register itself with the link system; if the target moved or disappeared, the link broke and the publisher was supposed to update it. In the web, a link is just a URL; if the URL no longer resolves, the browser returns a 404 and the publisher is none the wiser. This sounds like a flaw — and famously it produces "link rot" as the web ages — but it was the price of decentralisation. A system that demands a working link to every published target is a system with a central authority. A system that tolerates 404s is a system anyone can join without asking.

The trade was correct. The web exists; Xanadu does not.

📜

The 1993 release. On 30 April 1993 CERN released the entire web — code, protocols, specifications — into the public domain, with a one-page document declaring no royalties were owed and no patents would be filed. That document is now in the CERN archive. It is, by some measures, the most economically significant piece of paper of the late twentieth century. Releasing the technology rather than licensing it foreclosed the possibility that any single company could capture it — and is the reason every later attempt to build a "private web" (AOL, MSN, Compuserve in their walled-garden phases) eventually surrendered.

02 — HTTP

Five lines of text. Forever.

HTTP is the simplest protocol in this book. It is so simple you can speak it by hand, with a keyboard, against a real web server, and watch the bytes come back. Open a TCP connection on port 80 to any web server, type GET / HTTP/1.0, press Enter twice, and the server will reply with a status line, a few headers, and the document. That transcript is HTTP. Forty years of evolution have added optional headers, content negotiation, persistent connections, multiplexing, compression, and finally a binary wire format — but the conversational shape "client asks, server answers" has not changed. HTTP/1.0 from 1996 still works against modern servers; modern HTTP/1.1 still works against any plain socket.

A request has a structure. The first line is the request line: a verb (called a method), a path, and a protocol version. After it come headers — one per line, each a name-colon-value pair, terminated by a blank line. After the blank line, optionally, comes a body (for methods like POST and PUT that send data). A response has the same shape: a status line with the protocol version, a numeric status code (200, 404, 500, …) and a short reason phrase, followed by headers, then a blank line, then the response body.

Fig 11.2 — A complete HTTP exchange · request and response, byte for byte
Fig 11.2 — A complete HTTP exchange · request and response, byte for byte CLIENT → SERVER GET /index.html HTTP/1.1 Host: example.com User-Agent: curl/8.4.0 Accept: text/html, */*;q=0.8 Accept-Encoding: gzip, br Accept-Language: en, fi Connection: keep-alive [blank line — header / body separator] (GET has no body) ↑ method · path · version ↑ headers — name: value, one per line ↑ blank line marks end of headers over TCP port 80 SERVER → CLIENT HTTP/1.1 200 OK Date: Fri, 01 May 2026 10:00:00 Server: nginx/1.24.0 Content-Type: text/html Content-Length: 1256 Cache-Control: max-age=3600 Connection: keep-alive [blank line] <!DOCTYPE html> <html>…1256 bytes…</html> ↑ version · status code · reason ↑ headers describe the body ↑ body — the actual document STATUS CODE FAMILIES 1xx — informational 100 Continue, 101 Switching Protocols 2xx — success 200 OK, 201 Created, 204 No Content 3xx — redirection 301 Moved Permanently, 304 Not Modified 4xx — client error 400 Bad Request, 401, 403, 404, 429 Too Many 5xx — server error 500 Internal Server Error, 502 Bad Gateway, 503 Unavailable first digit categorises; the rest just identifies. Three digits, ~70 codes assigned, hundreds reserved.

The entire HTTP wire format. Three things make a request: a request line (method, path, version), zero-or-more name-value headers, and an optional body — separated from the headers by a single blank line. The response has the same shape with a status line replacing the request line. The 200 above means "the request succeeded and the body is the requested document"; a 404 would mean "the document does not exist"; a 500 would mean "the server crashed trying." Every web request you have ever made has this exact shape underneath, even if a browser is hiding it from you.

The big idea: stateless

One design decision in HTTP echoes through the rest of the web's architecture. HTTP is stateless: the server treats every request as independent. There is no concept of a "session" at the protocol level. The server is not required to remember that you asked for /index.html ten seconds ago when you ask for /style.css now. Each request stands alone and contains everything the server needs to answer it.

This sounds inconvenient — and it is, for any application that needs to remember a logged-in user. Every workaround we use to fake state on top (cookies, session IDs, JWT tokens, OAuth flows) exists to paper over this fundamental statelessness. But the property is what made the web scale. A stateless server can handle requests from a million different clients in any order, on any thread, on any of a thousand servers behind a load balancer, without any of them needing to share memory. It is the reason the web survived Facebook becoming popular and Wikipedia becoming popular and YouTube becoming popular without requiring fundamental redesigns of the protocol. Statefulness scales with engineering effort; statelessness scales with money.

What actually happens when you press Enter

Type https://example.com/ into a browser and press Enter. Watching with a packet sniffer, the sequence of events is precise and ordered. The browser does not "open the page." It performs about twenty separate operations across four layers, and the page only appears at the end.

Fig 11.3 — One curl, fully traced
Fig 11.3 — One curl, fully traced curl https://example.com — every layer, in time order t = 0 ~80 ms ① DNS — turn "example.com" into 93.184.216.34 UDP query to OS resolver, often hits the local cache · ~5 ms warm, ~50 ms cold ② TCP — three-way handshake to 93.184.216.34:443 SYN, SYN-ACK, ACK · one round-trip · ~20 ms over a continent (Chapter 10) ③ TLS 1.3 — encrypted, authenticated channel ClientHello + key share → server certificate, key share, encrypted Finished verify cert chain, derive session keys · one round-trip · ~20 ms (we will pull this apart in §05) ④ HTTP — encrypted GET / HTTP/1.1 sent over the TLS channel to TLS this is just opaque bytes; to the server inside, it's the request from Fig 11.2 ⑤ HTTP — server sends 200 OK and the document body, encrypted by TLS curl decrypts, prints the body to stdout · one more half-round-trip ⑥ TCP close (or keep-alive idle for the next request) FIN, ACK, FIN, ACK — or pool the connection for the next GET

Eighty milliseconds of work between the moment you press Enter and the moment the page begins to paint. Most of it is round-trips: one for DNS, one for TCP, one for TLS 1.3, one for HTTP. The browser and server are exchanging little structured messages at every layer. From the wire, the HTTP request inside step ④ is indistinguishable from random bytes — TLS is doing its job. From the user's chair, all of this looks like "the page loaded." The whole rest of this chapter is the inside view of stages ①, ③, and ④ above.

The methods, and why some of them matter

A method is a verb. It tells the server what kind of operation the request represents. HTTP defines about a dozen, but four are dominant: GET (fetch a resource), POST (submit data, possibly creating a new resource), PUT (replace a resource with the supplied data), and DELETE (remove a resource). The choice of method is not arbitrary; it is part of the contract with caches, proxies, and CDNs. Two properties matter especially: safety (the request does not change the server's state — GET and HEAD are safe) and idempotency (repeating the request has the same effect as a single request — GET, PUT, and DELETE are idempotent; POST typically is not). Caches assume safety; retry logic assumes idempotency. A POST that should have been a PUT will, on a flaky network, occasionally charge a credit card twice.

Fig 11.4 — HTTP methods · what they promise, what caches and clients assume
Fig 11.4 — HTTP methods · what they promise, what caches and clients assume the verb tells everyone — caches, proxies, retry loops — what to do on failure METHOD USED FOR SAFE? IDEMPOTENT? CACHEABLE? GET read a resource yes yes yes HEAD like GET, headers only — used for cache validation yes yes yes POST submit · creates a new resource · process data no no rarely PUT replace a resource entirely with the body no yes no DELETE remove a resource no yes no PATCH apply a partial modification to a resource no no* no OPTIONS describe what methods/headers the server accepts (CORS preflight) yes yes no * PATCH is idempotent only when the patch format itself is — JSON Merge Patch is, JSON Patch generally is not. CONNECT, TRACE omitted — tunnel and debug methods, rarely visible to applications.

REST APIs lean on this table heavily: each HTTP method maps to one CRUD operation (GET=read, POST=create, PUT=update, DELETE=delete) and the method's properties are the contract with the rest of the network. A reverse proxy like nginx or Cloudflare can cache GETs aggressively because they are safe and idempotent; it must never cache POSTs because the same POST body sent twice may create two records. A retry library should resend GETs on failure but should be cautious with POSTs — the server may have processed the first one but failed before sending its 200. Idempotency keys (a header applications add to make POSTs effectively idempotent) exist to paper over that gap.

⚠️

The double-charge problem. Stripe's idempotency-key documentation (and every payment processor's equivalent) exists because of one specific failure mode: client sends POST /charge, server processes the charge, server crashes before sending the 200, client times out, client retries — and now the customer is charged twice. The fix is for the client to attach a unique Idempotency-Key header; the server records the key alongside the result of the first execution and, on a duplicate request with the same key, returns the stored result instead of re-executing. The mechanism turns a non-idempotent operation into an effectively idempotent one. Every serious payment, signup, or stateful POST in production runs through some version of this dance — because HTTP itself does not guarantee what its method properties promise; the network does that, and the network is not entirely reliable.

03 — DNS

The phone book of the internet.

HTTP needs a destination. The destination, on the wire, is an IP address — a 32-bit number on the IPv4 internet, 128 bits on IPv6. Humans do not remember 93.184.216.34. Humans remember example.com. Something has to translate the second into the first, and it has to do this billions of times per second across the planet, with sub-second latency, and almost never lie. That something is the Domain Name System — a hierarchical, replicated, cached, mostly decentralised distributed database, designed by Paul Mockapetris in 1983 (RFC 882, RFC 883, then refined into RFC 1034 and RFC 1035 in 1987). DNS is older than the web, older than HTTP, and older than most readers of this book. It is also, structurally, the most fragile critical piece of the modern internet — and the one most often abused.

Before DNS, the ARPANET kept a single text file called HOSTS.TXT at the Stanford Research Institute. Every machine on the network downloaded it periodically; it listed every other machine's name and IP. By the early 1980s the file had grown to thousands of entries, was being edited by hand, and was distributed by FTP. A typo at SRI could break name resolution for the entire network. Mockapetris's design replaced this with a tree: the responsibility for each piece of the name space is delegated to a different organisation, and each organisation runs the servers authoritative for its own piece. The translation of www.example.com is not done by one machine; it is done by a chain of them, each pointing the asker one step closer to the answer.

The hierarchy

Names in DNS are read right to left. The rightmost label is closer to the root of the tree; the leftmost is the leaf. A trailing dot — almost always omitted in writing — represents the root itself. So www.example.com. means: the root, then the com top-level domain, then example registered inside com, then a host called www inside example.com. Each level of the tree is run by different operators. The root is run jointly by twelve organisations (VeriSign, ICANN, university and government bodies) operating thirteen logically named servers (a.root-servers.net through m.root-servers.net) replicated across hundreds of physical locations worldwide via anycast. The com servers are run by VeriSign on contract with ICANN. The example.com servers are run by whoever owns example.com — a hosting company, a corporate IT department, a CDN.

Fig 11.5 — The DNS namespace · a tree, read right to left
Fig 11.5 — The DNS namespace · a tree, read right to left every dot in a domain name is a hand-off to a different operator . (root) 12 orgs · 13 logical · ~1500 anycast com VeriSign · ~160M org PIR · ~10M net VeriSign · ~13M fi Traficom · ~570K …~1,591 total edu, gov, app, ai, … example google …200M www mail api www . example . com . ↑ ↑ ↑ ↑ leaf registered TLD root read RIGHT TO LEFT to walk the tree

The whole DNS namespace is one tree, ~360 million names deep at its widest. The root is run by twelve organisations cooperatively. Each top-level domain is run by an operator chosen by ICANN — VeriSign for .com and .net, Public Interest Registry for .org, a national authority for each country code. Each registered domain is run by whoever bought it. Each subdomain is run by them too, or by a CDN they delegated to. There is no single point of authority below the root. There is also no consensus mechanism — each operator is trusted to answer correctly for their own subtree. The system works only because each operator wants their own subtree to keep working.

How a name becomes an address

The mechanics are straightforward, and almost always cached. When your laptop wants to know example.com's IP, it asks its configured resolver — usually your home router, your ISP, or a public service like Cloudflare's 1.1.1.1 or Google's 8.8.8.8. The resolver may already have the answer in its cache; if so, it returns it in under a millisecond. If not, the resolver does the actual work, called recursion. It asks one of the root servers for example.com; the root replies, "I don't know, but the com servers do — here are their IP addresses." The resolver asks one of the com servers; com replies, "I don't know, but the example.com servers do — here are theirs." The resolver asks one of those; that server is authoritative for example.com and returns the actual IP. The resolver caches the answer for the duration specified by the authoritative server's TTL (typically 5 minutes to 24 hours), then returns it to the laptop. Total time: usually under 50 milliseconds the first time; under a millisecond on every subsequent lookup until the TTL expires.

Fig 11.6 — Recursive resolution · four servers, four questions, one answer
Fig 11.6 — Recursive resolution · four servers, four questions, one answer cold cache · about 50 ms · then 0 ms for the next 5 minutes your laptop getaddrinfo() recursive resolver 1.1.1.1 / 8.8.8.8 / ISP cache: hit? skip ②③④ ① "example.com?" root server (.) "go ask the .com servers — they're at …" .com TLD server "go ask example.com's NS servers — at …" example.com authoritative NS "example.com is 93.184.216.34, TTL 3600" resolver caches for TTL seconds subsequent users see ~1 ms answer ⑤ "93.184.216.34" UDP packets, 53/udp · falls back to TCP for responses bigger than ~512 bytes DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT) wrap the same questions in encryption — same protocol underneath

The resolver does the recursion; the laptop just asks once and gets the answer. The four-stage walk happens only on a cache miss; with reasonable TTLs and a busy resolver, hit rates are typically 80–99%. The whole protocol fits in tiny UDP datagrams — query and reply are usually under 100 bytes — which is why DNS is fast enough to feel free, and also why it has security problems: UDP makes it cheap to spoof.

Where the trust breaks

DNS was designed in 1983, before adversarial thinking became standard in network protocol design. The resolver believes whatever the authoritative server told it, signed by nothing. The query goes out as a UDP packet with a 16-bit transaction ID; whichever server replies first with the matching ID and a plausible answer is believed. If an attacker can guess the ID and beat the real reply to the resolver, the resolver caches the attacker's lie — and serves it to every user behind the resolver until the TTL expires. This is cache poisoning, and it has been theoretically known since the 1990s. In 2008, security researcher Dan Kaminsky discovered a practical, fast variant that worked against essentially every DNS resolver in deployment. He disclosed it privately to vendors first; Microsoft, Cisco, Bind, and dozens of others released coordinated patches on a single day in July 2008. The patch did not fix the underlying weakness — UDP queries are still spoofable in principle — but added source port randomisation, multiplying the attacker's guessing space from 2¹⁶ to 2³² and pushing the attack from "a few minutes" to "thousands of years."

Fig 11.7 — Cache poisoning · the attacker races the real answer
Fig 11.7 — Cache poisoning · the attacker races the real answer DNS believes whoever replies first with a matching transaction ID resolver cache empty "bank.com? id=0xA73F" real bank.com NS id=0xA73F · 203.0.113.7 arrives at t = 30 ms attacker, source-spoofed id=0xA73F · 198.51.100.66 arrives at t = 12 ms → believes the attacker → caches 198.51.100.66 → ignores real answer (duplicate response, wrong source) DEFENSE — randomise the source port too: 16-bit ID × 16-bit port = 32-bit guess space · ~65,000× harder to forge · Kaminsky 2008 patch

DNS cache poisoning in one picture. The resolver sends a query out; the attacker, who has guessed (or been told) a transaction ID, fires forged replies at the resolver from a spoofed source IP. If a forged reply with a matching ID arrives before the real one, the resolver caches the lie and serves it to every user behind it for the TTL duration — typically hours. Kaminsky's 2008 disclosure showed how to make this practical against unpatched resolvers in minutes. The fix — randomising the source port as well as the transaction ID — multiplied the attacker's search space ~65,000-fold and pushed the practical attack from minutes to centuries. DNSSEC is the deeper fix: cryptographically sign every record so resolvers can verify authenticity, not just guess less. DNSSEC has been deployed for two decades and still covers under half the global namespace, because the deployment cost is real and the perceived risk has fallen.

🛡️

The 2016 Dyn attack. On 21 October 2016, the Mirai botnet (~100,000 compromised IoT devices) directed sustained UDP floods at Dyn, a major DNS provider that hosted authoritative servers for Twitter, Netflix, Reddit, GitHub, Spotify, and dozens of others. For most of the morning on the US East Coast, those sites were unreachable — not because their own servers were down, but because nobody could resolve their names. DNS is a centralised dependency for half the internet's user-visible names. Take down a major DNS operator and a thousand sites go dark together. The attack ended when Dyn engineers manually reconfigured anycast routing to absorb the load; aftershocks ran for a week. The lesson — that even a "decentralised" tree has heavy concentration points — drove the modern push toward redundant DNS providers (multiple authoritative NS records pointing at independent operators) and the long, slow rollout of DNS-over-HTTPS, which puts the hop between your laptop and resolver out of an attacker's reach.

04 — Cryptography intro

The mathematics that makes the rest of the chapter possible.

Up to here, every protocol we have built — IP, TCP, HTTP, DNS — operates in cleartext. Anyone with access to any wire between you and the server can read every byte you send and every byte you receive, and can forge bytes pretending to be either side. This was acceptable when the internet was a few thousand academics. It is not acceptable when the internet is your bank account and your medical records. Solving this requires cryptography — and the next section, on TLS, assumes a working understanding of three primitives: hashing, symmetric encryption, and public-key cryptography. This section is a fast briefing. Chapter 14 takes them apart in mathematical depth. For now we want enough to read the TLS handshake.

One-way functions: the mathematical asymmetry that makes everything possible

Modern cryptography rests on operations that are easy to do in one direction and computationally infeasible to undo. The canonical example: multiplication. Take two large prime numbers, say 200-digit ones, and multiply them. A laptop does this in microseconds. Now hand someone the product alone — without telling them the primes — and ask them to find the factors. There is no fast algorithm. The best known methods take longer than the age of the universe for primes of that size. The asymmetry between "easy forward" and "hard backward" is the single mathematical lever that all of public-key cryptography pulls on.

Fig 11.8 — One-way function · easy forward, infeasible backward
Fig 11.8 — One-way function · easy forward, infeasible backward multiplication: easy · factoring: not easy EASY DIRECTION — given p and q, compute n p = 60848…(100 digits)…21943 q = 81429…(100 digits)…73089 n = p × q → microseconds on a laptop HARD DIRECTION — given n alone, recover p and q n = 49538…(200 digits)…07527 p = ? · q = ? best known classical algorithm: General Number Field Sieve expected runtime for a 2048-bit n: ≫ age of the universe on every computer ever built

The whole of RSA encryption hangs from one observation: if you choose two large primes and multiply them, anyone can use the product (the public key) to encrypt a message — but only someone who knows the original primes (the private key) can decrypt it. The privacy is not protected by secrecy of the algorithm; the algorithm is published. It is protected by the computational gap between forward and reverse. Quantum computers, in principle, can close this specific gap (Shor's algorithm, 1994); this is why the post-quantum cryptography effort matters. Chapter 14 unpacks the actual modular arithmetic; here, the picture is enough.

Hashing: the digital fingerprint

A cryptographic hash function is the simplest one-way operation. Feed in any data — a paragraph, a movie, a database — and the function returns a fixed-length string of bits (256 bits for SHA-256). Three properties matter: determinism (the same input always produces the same hash), preimage resistance (given the hash, you cannot practically find any input that would produce it), and the avalanche effect (a single-bit change in the input produces a completely different output, with no predictable relationship). The hash is, in effect, a fingerprint of the data — uniquely identifying it without revealing it. SHA-256 in particular is the workhorse of modern systems: it is what Bitcoin uses, what certificate authorities use, what every TLS handshake uses, what every Git commit uses to identify itself.

Fig 11.9 — SHA-256 avalanche · one bit changed, all 256 bits scrambled
Fig 11.9 — SHA-256 avalanche · one bit changed, all 256 bits scrambled flip one bit of input, expect ~50% of output bits to flip — every time INPUT 1 "The quick brown fox jumps over the lazy dog" SHA-256 d7a8fbb3 07d78094 69ca9abc b0082e4f 8d5651e4 6d3cdb76 2d02d0bf 37c9e592 256 bits as 64 hex characters INPUT 2 — one letter flipped (dog → cog) "The quick brown fox jumps over the lazy cog" e4c4d8f3 b9c2cb5e 3b9d4e92 7f8a4271 6c3b1d50 a5e8c6f4 9128bb33 e7d1ac86 ~128 of the 256 bits differ — every single one in a different position than you would predict

The avalanche property is what makes SHA-256 useful for tamper detection. Two inputs that differ by a single bit produce hashes with no recognisable relationship — there is no algebraic shortcut, no "hash patch" you could add to one to get the other. If you publish the hash of a file alongside the file, anyone who downloads it can re-hash and compare; any modification, even a single bit corrupted in transit or substituted by an attacker, produces a completely different hash. The same property powers Bitcoin's mining (find an input whose hash starts with N zeros), Git's commit IDs (the hash of the tree plus the parent commit plus the commit message), and the Merkle trees TLS certificates use to bind one signed root to many leaves. Cryptographic hashes are the universal "this is exactly what I sent" check.

Symmetric vs asymmetric: who has the key

Encryption comes in two flavours, and TLS uses both. Symmetric encryption uses a single shared secret key: both parties have it, both can encrypt and decrypt with it, anyone who gets a copy can read everything. The flagship algorithm is AES (Advanced Encryption Standard, NIST 2001), which encrypts 128-bit blocks using a 128, 192, or 256-bit key in roughly fifteen rounds of byte-level mixing. AES is fast — your CPU has hardware instructions for it (AES-NI on Intel/AMD, equivalent on ARM); a modern laptop encrypts gigabytes per second per core. The hard part is not the encryption itself; it is getting the shared key into both parties' hands without anyone watching.

Asymmetric (public-key) encryption solves exactly that bootstrap problem. Each party has a key pair: a public key they share with the world, and a private key they keep secret. Data encrypted with the public key can only be decrypted with the matching private key. Two strangers who have never met can establish a shared secret over a public channel: send each other their public keys, do a small mathematical dance (Diffie-Hellman, 1976), and both end up knowing a shared value that no eavesdropper can derive even with full transcripts. Public-key operations are slow — RSA is a thousand times slower than AES for the same data volume, ECDH is faster but still slower — so in practice we use public-key cryptography only to agree on a symmetric key, and then encrypt the actual conversation with AES. This is the core insight that makes TLS practical.

Fig 11.10 — Symmetric vs asymmetric · the same secret · or two halves of one
Fig 11.10 — Symmetric vs asymmetric · the same secret · or two halves of one two ways to keep a secret · one fast and one that solves the bootstrap problem SYMMETRIC — one shared key Alice Bob 🔑 shared key K enc(K, msg) ↔ dec(K, ct) AES-256 · ~5 GB/s per core ⚠ how do they get K to each other? ASYMMETRIC — public key + private key Alice Bob pub_A · prv_A pub_B · prv_B pubs are public prvs never leave their owner enc(pub_B, msg) → only Bob can read RSA-2048 · ~5 KB/s — too slow for streams ✓ no shared secret needed in advance TLS COMBINES BOTH use asymmetric to agree on a shared symmetric key … … then use that symmetric key with AES for the actual data slow setup, fast bulk · the bootstrap problem solved exactly once per connection

The two flavours have complementary strengths. Symmetric encryption is fast; everyone needs the same key. Asymmetric encryption solves the key-distribution problem; it is much too slow to encrypt a video call. The trick that powers TLS — and SSH, and Signal, and PGP, and essentially every encrypted protocol shipped since the 1990s — is to use the slow asymmetric primitive only to agree on a shared symmetric key, and then run the bulk of the conversation through fast symmetric encryption with that key. The next section traces this exact dance through a real TLS 1.3 handshake.

05 — TLS

One round trip · two strangers · a perfect channel.

TLS — Transport Layer Security — is the protocol that turns the cleartext byte stream of TCP into an encrypted, authenticated one. It is what the lock icon in your browser refers to. It is the protocol that runs underneath HTTPS, IMAPS, SMTPS, and roughly every "S"-suffixed protocol since the late 1990s. Its lineage starts with SSL 1.0 (Netscape, 1994, never released because it was discovered to be broken before launch), through SSL 2.0 (1995, broken), SSL 3.0 (1996, eventually broken via POODLE in 2014), then renamed TLS 1.0 (1999), TLS 1.1, TLS 1.2 (2008, dominant for a decade), and finally TLS 1.3 (RFC 8446, August 2018) — the version your browser uses today, the one that is actually clean.

The protocol's job is simple to state: two parties who have never met should end up sharing a secret key, mutually authenticated, with full confidence that no third party listening to or manipulating the conversation can read or alter the result. The mechanism is breathtakingly compact in TLS 1.3 — one round trip, not the two of TLS 1.2 — and combines every primitive from §04: a key exchange to bootstrap a shared secret, a digital signature over a certificate chain to authenticate the server, and symmetric encryption to actually carry the data once the handshake is done. The whole thing finishes in about 20 milliseconds on a continental link.

The TLS 1.3 handshake

Fig 11.11 — TLS 1.3 · one round trip from "hello" to encrypted data
Fig 11.11 — TLS 1.3 · one round trip from "hello" to encrypted data RFC 8446 · the version your browser actually uses CLIENT browser / curl SERVER example.com:443 ① ClientHello supported ciphers · supported groups · client_random · key_share (X25519 ephemeral pubkey) "Here is half of a Diffie-Hellman, plus what I can do." ② ServerHello + {EncryptedExtensions, Certificate, CertificateVerify, Finished} server_random · key_share (server's ephemeral X25519) · cipher_suite chosen + X.509 certificate chain · + signature over the handshake transcript with private key "Here is my half. Verify my certificate chain. The rest is encrypted." both sides now compute the SAME shared key ECDH(client_priv, server_pub) = ECDH(server_priv, client_pub) — derive AES + IV via HKDF ③ {Finished} — proves client also derived the keys MAC over the entire handshake transcript, encrypted with the new keys ④ HTTP/1.1 GET / and the rest of the conversation AES-GCM, ChaCha20-Poly1305, etc. · encrypted + authenticated

The whole TLS 1.3 handshake: one round trip. The client sends a hello with half of an ephemeral Diffie-Hellman key exchange (an X25519 public key) plus the list of cipher suites it supports. The server replies with the other half plus its X.509 certificate plus a signature, with most of that already encrypted under the freshly derived shared key — possible because both sides can compute the shared key as soon as they see each other's public halves. The client verifies the certificate against its trusted root store (Fig 11.12), checks the signature, and replies Finished. From this moment forward, every byte is encrypted with AES-GCM (or ChaCha20 on phones without AES hardware) and authenticated against the agreed key. TLS 1.2 needed two round trips to do the same job; TLS 1.3 saved one round trip by being smarter about message ordering.

The certificate chain

The handshake includes a step the diagram glossed over: the client "verifies the certificate." What does that actually mean? The server's certificate is a small file containing the server's public key, the hostnames it claims to serve, validity dates, and — critically — a digital signature by some certificate authority the client already trusts. The certificate authority's own certificate is signed by another, more trusted authority. That one is signed by another. The chain ends at a root certificate — a self-signed certificate that the client trusts because it was shipped with the operating system or browser. There are roughly 150 such roots in the major stores; they are run by companies like DigiCert, Let's Encrypt, GlobalSign, Sectigo, and Google. The whole system rests on the trust placed in those ~150 organisations.

Fig 11.12 — The certificate chain · why your browser trusts a stranger
Fig 11.12 — The certificate chain · why your browser trusts a stranger three signatures, walked from leaf to root, must all verify ROOT CA · self-signed e.g. "ISRG Root X1" (Let's Encrypt) shipped in browser/OS trust store · ~150 of these worldwide signs INTERMEDIATE CA e.g. "R3" private key kept online · root key kept offline in a vault signs LEAF — example.com contains: hostnames, server's public key, validity dates renewed every ~90 days; private key never leaves the server CLIENT VERIFY walk leaf → root verify each signature with parent's pubkey FAILURE MODES — hostname doesn't match — certificate expired — signature invalid — root not in trust store — certificate revoked (CRL/OCSP) → browser warning page Certificate Transparency (RFC 6962) — every issued cert publicly logged · misissuance now detectable retroactively

A web server presents not one certificate but a chain. The leaf certificate is for the actual hostname (example.com); it is signed by an intermediate CA's private key. The intermediate's certificate is signed by a root CA's private key. The root is self-signed and shipped in your browser's trust store. The client walks the chain from leaf to root, verifying every signature; if any link fails, the connection is aborted with a scary browser warning. The whole system rests on ~150 root operators behaving themselves — and on the Certificate Transparency infrastructure (mandatory since 2018) that publicly logs every certificate issued, so misbehaviour becomes detectable. The 2011 DigiNotar compromise — a Dutch CA whose private keys were stolen and used to issue fraudulent *.google.com certificates — destroyed the company within weeks and was the event that drove modern transparency requirements.

Forward secrecy: protecting yesterday's data

One subtlety in the TLS 1.3 design deserves explicit attention. The key exchange uses ephemeral Diffie-Hellman keys — fresh random values generated at the start of each connection, never reused, discarded the moment the connection ends. This property is called forward secrecy, and it has a profound consequence: even if an attacker records every encrypted byte of every TLS connection your server has ever served, and then years later steals your server's private key, they still cannot decrypt any of those past connections. The private key authenticated the server during the handshake, but it did not derive the session key. The session key was derived from ephemeral material that no longer exists anywhere.

Fig 11.13 — Forward secrecy · stealing the long-term key tomorrow does not unlock yesterday
Fig 11.13 — Forward secrecy · stealing the long-term key tomorrow does not unlock yesterday a recorded session, decades later, is still unreadable 2024 SESSION ephemeral key e₁ derives shared K₁ e₁ erased after Jan 2024 2025 SESSION ephemeral key e₂ derives shared K₂ e₂ erased after Jul 2025 SERVER LONG-TERM KEY private RSA / ECDSA key used to SIGN, not to encrypt stolen 2026 May 2026 — compromise ATTACKER WHO STOLE THE 2026 KEY AND HAS RECORDINGS OF 2024 + 2025 → can impersonate the server going forward (until cert is revoked) → CANNOT decrypt past sessions — e₁ and e₂ are gone, K₁ and K₂ unrecoverable forward secrecy: yesterday's traffic is safe even from tomorrow's compromise

Forward secrecy is the protection that recorded encrypted traffic remains encrypted even if the long-term server key is later compromised. The server's private key signs the handshake to authenticate; it does not decrypt the data. Decryption depends on the ephemeral keys, which are generated fresh per session and discarded after. An attacker recording all your bank's TLS traffic for ten years and then stealing the bank's private key gets nothing — every session's keys are gone. This property is now mandatory in TLS 1.3 (and was optional but common in TLS 1.2). It is the technical reason "store now, decrypt later" attacks (which are part of why post-quantum cryptography matters) only threaten current and future traffic, not historical recordings — and why the existence of governments archiving global TLS traffic for future quantum decryption is real but limited in what it can recover.

06 — HTTP/2 & HTTP/3

The two upgrades that broke compatibility on purpose.

HTTP/1.1 from 1999 was good enough for nearly two decades. Then mobile networks happened, JavaScript-heavy sites happened, the average web page grew from 50 KB to 3 MB, and the protocol's serial-request model started costing real time. Two new versions followed — HTTP/2 in 2015 and HTTP/3 in 2022 — each addressing a specific bottleneck the previous one couldn't. They are not "new HTTPs"; they are new wire formats for the same request-response semantics. The headers, methods, and status codes you saw in §02 are unchanged. What changes underneath is how those messages are serialised onto the network, multiplexed, and transported.

HTTP/2: binary, multiplexed, on the same TCP connection

The headline problem with HTTP/1.1 was head-of-line blocking at the application layer. A browser opening a typical page needed to fetch dozens of resources — HTML, then 10 stylesheets, then 30 JavaScript files, then 50 images. HTTP/1.1 sent these one at a time down a single TCP connection (with at most six parallel connections per host as a workaround). A slow response held up everything queued behind it. HTTP/2 (RFC 7540, 2015 — derived from Google's SPDY) replaced the text-based wire format with a binary one and introduced streams: many independent request/response pairs interleaved on the same TCP connection, each chopped into binary frames tagged with a stream ID. The server can deliver frame 4 of stream A, then frame 1 of stream B, then frame 5 of stream A, in any order — and the client reassembles each stream from its frames. Sixty parallel requests now share one connection, with no head-of-line blocking at the HTTP layer.

Fig 11.14 — HTTP/1.1 vs HTTP/2 · serial vs multiplexed on the same TCP connection
Fig 11.14 — HTTP/1.1 vs HTTP/2 · serial vs multiplexed on the same TCP connection six requests, one connection — two ways to schedule them HTTP/1.1 — strictly serial slow response → everything behind it waits t=0 t=600ms html css slow.js (200 ms) app.js img1 img2 page renders ~540 ms HTTP/2 — multiplexed streams, frames interleaved on the same TCP connection slow stream is just bytes; everything else proceeds in parallel t=0 html css slow.js — frames interleaved, no blocking app.js img1 img2 page renders ~240 ms

The same six resources, fetched two different ways. HTTP/1.1 sends one request, waits for the full response, then sends the next; a slow resource holds up everything queued behind it. HTTP/2 chops every request and response into binary frames tagged with a stream ID and interleaves them on a single TCP connection — small responses can fly past slow ones, and the browser reassembles each stream as its frames arrive. Same TCP, same TLS, same HTTP semantics — different wire format. Real-world page-load improvements range from 10% to 50% depending on resource shape. Server push (the protocol's other headline feature) was added then quietly removed from the spec — it never delivered the predicted benefits.

HTTP/3: still TCP's fault

HTTP/2 solved head-of-line blocking at the HTTP layer. It did not solve it at the TCP layer. TCP delivers a single ordered byte stream; if one packet is dropped, every byte after it is held back until the missing packet is retransmitted — even if those later bytes belong to entirely different HTTP/2 streams that have no logical dependency on the missing data. On a clean network the cost is invisible. On a flaky mobile connection — 5% packet loss, intermittent coverage — the TCP-layer head-of-line blocking dominates page load time. The fix had to happen below HTTP, at the transport layer itself.

QUIC (Quick UDP Internet Connections, RFC 9000, 2021) replaces TCP entirely for HTTP traffic. It runs on top of UDP — that "fire and forget" protocol from §01 of Chapter 10 — and reimplements everything TCP did (reliable ordered delivery, flow control, congestion control, retransmission) plus everything TLS did (encryption, authentication) plus the multiplexing of HTTP/2 — but with one crucial change: each HTTP/2-style stream is independently ordered, so a lost packet on stream A no longer holds back stream B. QUIC was developed by Google starting around 2012, deployed inside Chrome and to Google's servers around 2016, and standardised by the IETF in 2021. HTTP/3 (RFC 9114, 2022) is just HTTP semantics over QUIC. As of 2026 it handles a third of all web traffic — almost everything to and from Google, Cloudflare, and Meta — and is rapidly catching up to HTTP/2 everywhere else.

Fig 11.15 — HTTP/3 over QUIC · escaping TCP at the protocol layer
Fig 11.15 — HTTP/3 over QUIC · escaping TCP at the protocol layer three layers of the legacy stack collapse into one — built on UDP HTTP/2 — three protocols stacked HTTP/2 (binary, mux) TLS 1.3 (encrypt) TCP (reliable byte stream) IP (best-effort packets) 3 round trips: TCP + TLS + first request ⚠ TCP holds back streams on packet loss HTTP/3 — TCP+TLS fused into QUIC HTTP/3 (binary, mux) QUIC streams + reliable + encrypted + handshake all in one protocol UDP (datagrams) 1 round trip · 0-RTT for resumed sessions ✓ each stream independently ordered WHY UDP — TCP'S ASSUMPTIONS ARE THE PROBLEM TCP is implemented in OS kernels and treats its byte stream as monolithic. QUIC moves transport into user space, ships in the application, can iterate per-release. UDP is just the chassis.

HTTP/3 is what you get when you take HTTP/2's good ideas and refuse to inherit TCP's bad ones. The three independent layers of HTTP/2 (HTTP, TLS, TCP) collapse into one (HTTP/3 over QUIC), which itself runs on UDP because UDP, in §01 of Chapter 10, was the protocol that didn't get in the way. The handshake is one round trip in the cold case, zero in the resumed case. Each stream has its own ordering, so packet loss on one image's bytes doesn't stall a different image's bytes. The catch — the reason this transition has taken a decade — is that QUIC is implemented in user space rather than the kernel, every server and CDN had to rewrite its transport layer, and corporate firewalls had to learn to allow UDP/443. The transition is happening, slowly, the way the IPv6 migration in Chapter 9 happens: not with a flag day, but with a steady year-on-year shift of the largest operators dragging the rest of the network behind them.

🔁

The recurring pattern. Every protocol in this chapter follows the same arc. Berners-Lee's HTTP — minimal, text-based, perfect for 1991 — got progressively pushed past its limits and replaced with denser, harder, more cryptographic versions. DNS — designed in 1983 with no adversarial model — got progressively patched with port randomisation, DNSSEC, DoH, DoT. TLS — born as Netscape's SSL 1.0, never released because broken — went through six numbered versions before reaching the clean 1.3 most browsers now use. None of these protocols was correctly designed at first issue; all of them earned their current shape from twenty-plus years of attacks and patches. This is unusually true of network protocols specifically, because their designers cannot iterate quickly: every change has to be deployed across millions of independently operated machines without breaking the existing ones. The slow grace of protocol evolution is one of the more remarkable forms of engineering humans have ever managed at planetary scale.

The seam to Chapter 12

Chapter 11 has built the web's transport. We can now send a typed URL through a verified, encrypted tunnel; we can negotiate a fresh shared key with a stranger every time; we can name and find any machine on Earth by composing a few labels. What we have not yet done is run code inside the document we just fetched. HTML describes a structure of text and images. The page is alive only because something else — a programming language baked into every browser — animates it. That language is JavaScript. It was invented in ten days. It runs everywhere. It powers the most popular development platform in computing. And it never should have worked. Chapter 12 is its story.

Chapter 12

JavaScript
The Language That
Shouldn't Have Worked

Brendan Eich was given ten working days, in May 1995, to design a scripting language for the Netscape browser. The result — committed to a release branch before it had a stable name and shipped before it had a specification — is now the most widely deployed programming language in history. It runs in every browser. It runs on servers via Node.js. It runs in every database that speaks JSON. The fact that this happened is a triumph of practical engineering over good taste, and the fact that it works is a series of architectural decisions made under deadline that turned out to be unexpectedly durable.

TopicsEvent loop · Promises · V8 · Node · DOM · same-origin · XSS · CSP
Era covered1995 → present
Chapter 12 hero · JavaScript The Language That Shouldn't Have Worked { } let x = 1; x.toFixed(); // "1" SHIPPED IN 10 days Mocha · May 1995 LiveScript · Sep 1995 JavaScript · Dec 1995 three names · one language · all of the web
01 — The browser problem

HTML can describe; it cannot do.

By 1994 the web that Berners-Lee had launched in Chapter 11 was growing exponentially — but it was growing as a system of static documents. A web page in 1994 was an HTML file: text, images, links, perhaps a form that POSTed somewhere. Click a link, the browser fetched a new page; submit a form, the server replied with another page. Every interaction was a full round trip and a full re-render. For a research-paper repository this was fine. For anything closer to an application — a calculator, a stock quote that updates, validation of a form before submission, a clock — it was hopeless. The browser needed to be able to run code.

The first attempt was Sun Microsystems' Java applet. In late 1994 Sun announced HotJava, a browser written in Java that could embed small Java programs (applets) inside web pages. Each applet ran inside a Java Virtual Machine, isolated from the host, with a defined API for drawing and user input. Conceptually it was the right answer — a real, type-safe, sandboxed language with a proper runtime. In practice it was hopeless for the early web. The JVM took five to ten seconds to start. Applets were heavyweight, hard to write in small pieces, and required the server to host compiled .class files. Every applet was a separate island; you couldn't easily reach into the surrounding HTML page from inside Java, or vice versa. Applets shipped, were used for games and animation, and slowly died: by 2000 they were a niche, by 2010 obsolete, by 2017 removed from browsers entirely.

Netscape — the company whose browser had become the market-share leader through 1994 and 1995 — wanted something different. Marc Andreessen, Netscape's twenty-three-year-old co-founder, believed the web's killer feature would be the ability to write quick scripts that lived inside the page itself — small, untyped, easily embedded snippets that any HTML author could add without compilers, without classpaths, without ten seconds of JVM warm-up. The language should look enough like Java that the buzzword "Java" could be used in marketing, but should be far simpler. It should run instantly. It should fail loudly but not crash the browser. It should be easy enough that a designer who had learned HTML over a weekend could pick up. Netscape needed it shipped in the upcoming Netscape 2.0 release. The release window was eleven weeks away.

Fig 12.1 — The web in 1994 · static documents and a tab-out to Java
Fig 12.1 — The web in 1994 · static documents and a tab-out to Java two ways to make a page do something — neither was working PURE HTML — nothing runs <form action="/submit"> <input name="age" /> <button>Send</button> </form> click submit → full page reload no validation · no clock · no animation no live updates JAVA APPLET — too heavy <applet code="Clock.class"> [ JVM warming up... ] [ loading 200 KB of classes ] ~5–10 seconds before any pixels </applet> island in the page · cannot reach DOM easily requires .class compilation · dies by 2017 WHAT WAS NEEDED a small, embeddable, instantly-loaded, untyped scripting language living INSIDE the HTML designed for HTML authors, not Java engineers — no compiler, no classpath, no warm-up

By 1994 the two paths to interactive web pages had failed. Pure HTML could describe a form but every interaction round-tripped to the server. Java applets could run real code but they took ten seconds to warm up, were hard to embed, and lived in a sealed island that couldn't easily talk to the surrounding page. Netscape's bet was that the right answer was something a designer could write in five lines, embedded directly between <script> tags, that ran the moment the page loaded. They needed the language designed and shipped in eleven weeks. They hired Brendan Eich.

02 — Eich 1995

Ten days. Three names. One language.

Brendan Eich joined Netscape in April 1995. He was 33, a programming language enthusiast, and had been promised — when the recruiters pitched him — that he would get to work on bringing Scheme (a Lisp dialect) to the Netscape browser. What he got instead, on his first week, was a different brief: Marc Andreessen and Bill Joy (Sun co-founder, brought into the Netscape-Sun partnership) wanted a language with C-like syntax — to look familiar to working programmers — but with the dynamic, interpreted, untyped flexibility of Scheme or Self underneath. They wanted it to embed inside HTML. They wanted it to ship with Netscape 2.0, currently scheduled for September. Eich had ten working days to produce a prototype.

He delivered. The first version, internally called Mocha, was a working interpreter by mid-May 1995. The language Eich actually designed was — under the C-like syntax — a deeply Scheme-influenced lexically-scoped language with first-class functions, closures, and prototypal inheritance taken from David Ungar and Randall Smith's Self. It had no classes (those came twenty years later, as sugar). It had no integers (only floating-point numbers, IEEE 754 doubles all the way). It had implicit type coercion that anyone who has used JavaScript has cursed. It was a pragmatic compromise of genuine elegance underneath surprising syntactic awkwardness — the shape of any language born in ten days under the wrong brief.

Then came the naming. Mocha was the internal name. In September 1995, shortly before launch, Netscape renamed it LiveScript. Then, in December 1995, with great fanfare, Netscape announced a partnership with Sun Microsystems and renamed it again, to JavaScript. The name change was pure marketing: Netscape wanted to ride Sun's Java publicity. The two languages were and remain unrelated — JavaScript shares almost nothing with Java besides the C-derived syntax that hundreds of languages share — but the name stuck, and three decades later it is the dominant programming language by raw deployment count and the source of endless confusion for first-time learners. Eich himself has said publicly that the name was the worst part of the whole project.

Fig 12.2 — May 1995 to ECMAScript 2026 · the slow legitimisation
Fig 12.2 — May 1995 to ECMAScript 2026 · the slow legitimisation a marketing rename, a standardisation truce, and three decades of catching up May 1995 10 days Eich at Netscape "Mocha" Sep 1995 "LiveScript" first internal rename Dec 1995 "JavaScript" Sun deal marketing borrowed 1997 ECMA-262 standardised as "ECMAScript" 2009 ES5 strict mode 2015 ES6 / ES2015 classes, let, arrow fns, Promises, modules → 2026 annual cycle ESyyyy "ECMAScript" vs "JavaScript" Sun owned the trademark "Java." ECMA International became the neutral standardiser to publish the spec. In practice the two names refer to the same language; "ECMAScript" appears in the spec, "JavaScript" everywhere else.

JavaScript's first decade was a slow walk from "thrown together for a release deadline" to "actually standardised." ECMA International became the neutral standards body in 1997 because Netscape did not own the trademark "Java" (Sun did) and the language needed a non-Netscape home. Microsoft shipped a near-clone called JScript in Internet Explorer; the standardisation kept the two compatible enough that web pages worked in both. The really transformative version was ES6 in 2015 — twenty years after the original — which finally added classes, modules, native Promises, and arrow functions. Since 2015 the language has shipped a yearly small release; the days of waiting six years for the next ECMAScript are over.

"I had to be done in ten days or something worse than JS would have happened."

— Brendan Eich, on the early Netscape design
⏱️

The ten-day legacy. Several JavaScript quirks trace directly to the original ten-day window. typeof null === "object" is a bug that became compatibility-locked: Eich described it as a "leftover" he never got to fix. 0.1 + 0.2 !== 0.3 is just IEEE 754 (Chapter 2.6), but the choice to have no integer type at all — every number is a 64-bit float — meant JavaScript could never expose the difference. Implicit coercion ([] + {} === "[object Object]") is a pragmatic shortcut that did not get a sober second look. None of these is unfixable in principle; all of them are unfixable in practice, because billions of pages on the live web rely on the exact existing behaviour. The shape JavaScript will have in fifty years is the shape it has today, plus careful additions; it cannot remove anything.

03 — The event loop

One thread. Never blocking. Forever in motion.

JavaScript runs on a single thread. There is one call stack, one execution context. There is no fork(), no pthread_create(), no parallel Java-style worker threads sharing variables. This sounds like a fatal weakness — and would be, if JavaScript ever blocked. It does not. Every operation that might take time — fetching a URL, reading a file, waiting for a timer — returns immediately and arranges to be told when the work is done. The thread, freed from waiting, picks up the next ready callback and runs it. The mechanism that orchestrates this is the event loop. It is the architectural choice that makes JavaScript work for browsers, and it is the choice that Node.js exported back to the server world.

The model has four moving parts. The call stack holds the function frames currently executing — the same stack Chapter 3 dissected, in JavaScript form. The macrotask queue (also called the callback queue) holds callbacks ready to run from completed I/O, expired timers, and DOM events. The microtask queue holds callbacks from settled Promises and from queueMicrotask(); it is drained completely after every macrotask, before any rendering. And the render step is when the browser, between macrotasks, recomputes layout and paints any visual changes. The event loop's job is to coordinate them: pull one task off the macrotask queue, run it to completion, drain all microtasks it produces, optionally render, repeat. Forever.

Fig 12.3 — The event loop · stack, microtasks, macrotasks, render
Fig 12.3 — The event loop · stack, microtasks, macrotasks, render one thread, three queues, ~60 turns per second CALL STACK main() handleClick() parseJSON() grows / shrinks on call / return EVENT LOOP MICROTASKS drained completely between tasks Promise.then(...) queueMicrotask() MACROTASKS one per loop iteration setTimeout(cb, 0) click event handler RENDER ~16 ms cadence (60 fps) layout · paint · composite requestAnimationFrame() while (true) { task = macrotaskQueue.shift(); // one per iteration run(task); drainMicrotasks(); maybeRender(); }

The whole algorithm. Pull one macrotask off the queue (a click handler, a timer, a network response). Run it on the call stack. When it returns, drain the microtask queue completely — every .then, every awaited promise resolution. Then let the browser render if it's been ~16 ms since the last frame. Repeat. The single thread never blocks because nothing on the queues blocks; long-running computation (a heavy JSON.parse, a tight loop) does block the loop, which is why frozen UI in browsers and event-loop lag in Node.js are the same bug — the thread is stuck on the call stack instead of returning to the loop.

Three eras of asynchrony

The event loop has been the same since 1995. What has changed three times is the way JavaScript code spells "do this when the I/O is done." The first era was callbacks: pass a function to whoever does the work, and it will call your function back when finished. Simple, and the source of the famous callback pyramid of doom — three or four callbacks nested inside each other, indentation marching across the screen, error handling impossible. The second era was Promises (standardised in ES6, 2015): an object representing a future value, with .then() and .catch() chained horizontally. The third era is async/await (ES2017): syntactic sugar that lets you write asynchronous code as if it were synchronous — const x = await fetch(url); — while the compiler invisibly transforms it into a Promise chain underneath. All three eras coexist in any modern codebase, because legacy never dies; understanding all three is one of the shibboleth skills of working in the language.

Fig 12.4 — The same fetch · three eras of async syntax
Fig 12.4 — The same fetch · three eras of async syntax underneath, the event loop is identical — only the surface syntax has changed ERA 1 — CALLBACKS (1995 →) getUser(id, function (user) { getOrders(user, function (orders) { getInvoice(orders[0], function (invoice) { render(invoice); // pyramid of doom · errors lost }); }); }); ERA 2 — PROMISES (ES6, 2015) getUser(id) .then(user => getOrders(user)) .then(orders => getInvoice(orders[0])) .then(render) .catch(err => alert(err)); // single shared error path ERA 3 — ASYNC / AWAIT (ES2017) async function showInvoice(id) { try { const user = await getUser(id); const orders = await getOrders(user); const invoice = await getInvoice(orders[0]); render(invoice); } catch (err) { alert(err); } // looks synchronous · isn't

Three ways to spell the same thing. All three compile, eventually, to operations on the macrotask and microtask queues from Fig 12.3 — async/await is sugar over Promises, Promises are sugar over callbacks, and callbacks were the bare metal. The underlying mechanism never changed; the language did. Most JavaScript a working developer touches today uses async/await for control flow, Promises for combinators (Promise.all, Promise.race), and old-style callbacks only at the lowest layers (DOM events, Node.js APIs that predate Promises). Reading older code or an Express middleware function still requires fluency in all three.

04 — V8 and Node.js

An interpreter that pretended to be a compiler · then a runtime that escaped the browser.

JavaScript was originally interpreted line by line by Netscape's SpiderMonkey engine — fast enough for the kind of small validation script and DOM manipulation people wrote in 1996, hopelessly slow for anything resembling an actual application. By 2008, when Google released V8 alongside the first Chrome browser, JavaScript engines had become aggressive optimising compilers masquerading as interpreters. V8's design — and the design of every major engine since — is a four-stage pipeline that takes source code to fast native machine code, with a feedback loop that re-optimises whichever functions turn out to be hot.

Fig 12.5 — V8 · from source to optimised native code, with a feedback loop
Fig 12.5 — V8 · from source to optimised native code, with a feedback loop parse · interpret · profile · optimise · deoptimise on assumption violation PARSER source → AST lazy: skip unused fns IGNITION AST → bytecode interpret + profile hot TURBOFAN JIT compile type-speculate, inline NATIVE x86 / ARM AES instructions, SIMD close to C performance DEOPT — type assumption violated passed a string where an integer was expected? throw away the optimisation, fall back to bytecode SPECULATIVE OPTIMISATION Ignition watches a function while interpreting bytecode: which types does it actually see? If a function has been called 1000 times always with integers, TurboFan generates code that ASSUMES integers, runs ~10× faster — and falls back the moment that assumption breaks. SpiderMonkey, JavaScriptCore work the same way.

V8's two-tier architecture mirrors Java's HotSpot, Lua's LuaJIT, and most modern dynamic-language engines. The bytecode interpreter (Ignition) is fast to start up — code begins running immediately, no compilation pause — but slow per operation. The JIT compiler (TurboFan) is slow to compile but produces machine code competitive with C. The whole pipeline runs concurrently with the program: hot functions get optimised in a background thread, and the running program is patched mid-execution to point at the new optimised version. Speculative type assumptions ("this counter has always been an integer; assume integer") deliver most of the speedup; broken assumptions trigger deoptimisation, which throws the optimised code away and falls back to the bytecode interpreter. The whole machinery is invisible to user code — until you write a function that intermittently violates V8's expectations and notice it suddenly running at 1/10 speed.

Node.js: V8 outside the browser

In 2009 a developer named Ryan Dahl watched the progress bar on a file upload in a browser and realised the standard server-side approaches — Apache forking a process per request, blocking on disk I/O — were exactly the wrong shape for systems with many slow connections. The browsers had already solved this problem: single-threaded, event-loop, never-blocking. Why not move that model to the server? He took V8 — Google's then-new JavaScript engine — stripped away the browser, added a C library called libuv that provided a cross-platform asynchronous event loop and file/network I/O, and bound them together with a small JavaScript standard library. He called it Node.js and showed it at JSConf EU in Berlin that November. The community reaction was immediate.

Within five years Node had become the default backend for Silicon Valley startups. Within ten years it had eaten enormous amounts of the server-side language landscape. The reasons are practical, not aesthetic: a single-threaded async-I/O model handles thousands of slow connections (the WebSocket connections of a chat app, the open HTTP requests of a real-time feed) on a single CPU core where a thread-per-request model would have collapsed; and sharing one language between browser and server cut the cognitive cost of building web applications roughly in half. Node is not the best server for everything (CPU-bound workloads are still better served by Go, Rust, or threaded Java), but for the sweet spot of "many connections, mostly waiting on I/O" it is hard to beat.

Fig 12.6 — Node.js · V8 + libuv + a JavaScript stdlib
Fig 12.6 — Node.js · V8 + libuv + a JavaScript stdlib three pieces glued together · one of them came from Chrome, one from Mac OS X YOUR JAVASCRIPT app.js · npm packages · framework code NODE.JS STANDARD LIBRARY fs · http · net · stream · crypto · child_process · cluster · … V8 JavaScript engine (Chrome's) parse · interpret · JIT · GC libuv (C library) cross-platform async I/O epoll · kqueue · IOCP · thread pool OPERATING SYSTEM syscalls · sockets · filesystems · processes · the kernel of Chapter 4

Node is small in concept and large in consequence. V8 (originally written for Chrome) executes the JavaScript. libuv (built for Node specifically, to add Windows support on top of the libev/libeio Unix codebase Dahl had been using) handles the OS-specific async I/O — epoll on Linux, kqueue on macOS/BSD, IOCP on Windows — wrapped in one common interface. Node's standard library glues the two together and exposes them to user code as JavaScript APIs. Everything else — Express, Next.js, npm's million packages — is just user code on top. The architecture is the reason JavaScript could become a server-side language in the first place: V8 was already fast and free, and libuv abstracted away the per-OS event-loop differences that would have otherwise made Node a per-platform port.

05 — The DOM and rendering

From bytes to pixels in five steps.

Loading a web page does not "show" the HTML. The browser, after fetching the bytes, runs them through a five-stage pipeline called the critical rendering path. The HTML is parsed into a tree (the DOM). The CSS is parsed into another tree (the CSSOM). The two are combined into a render tree describing what actually gets drawn. Layout computes where each box goes. Paint draws the pixels. Composite assembles layers. JavaScript can intervene at any stage — modifying the DOM, modifying the CSSOM, forcing a re-layout, triggering a repaint. Understanding this pipeline is the difference between a page that renders in 60 milliseconds and a page that stutters; understanding it is also necessary to read any modern frontend performance literature.

Fig 12.7 — Critical rendering path · five stages, ~16 ms each
Fig 12.7 — Critical rendering path · five stages, ~16 ms each parse · style · layout · paint · composite — every frame HTML bytes <html>…</html> CSS bytes styles.css DOM tree document elements CSSOM tree style rules + cascade render tree visible boxes only LAYOUT box positions, sizes PAINT draw pixels per layer COMPOSITE stack layers · GPU PIXELS ON SCREEN ~60 fps target CHEAP vs EXPENSIVE PROPERTIES left, top, width, height — trigger LAYOUT (re-flow every box) → expensive transform, opacity — only COMPOSITE (GPU layer math) → cheap, animatable at 60 fps

Every frame the browser renders, it walks some prefix of this pipeline. Adding a new DOM element forces all five stages — parse, style, layout, paint, composite. Changing a CSS color forces paint and composite but skips layout. Animating transform: translateX() only touches composite — the GPU shifts an existing layer without re-laying-out anything. The performance gospel of the modern web ("animate transform and opacity, never width or height") falls out directly: cheap properties skip the expensive stages. Tools like the Chrome DevTools Performance panel let you watch this pipeline run frame by frame and see exactly which property change forced layout when.

The DOM itself is, mechanically, just a tree of objects exposed to JavaScript through methods like document.querySelector() and element.appendChild(). Modifying a DOM node triggers the relevant prefix of the pipeline; that is why naive "loop-and-append" code in JavaScript can be hundreds of times slower than building up a string and assigning to innerHTML once. React, Vue, Svelte, and the rest of the modern frontend framework family exist primarily to batch DOM changes — the programmer writes declarative rules ("the page should look like X"), the framework computes the minimum set of DOM mutations needed, and the browser pipeline runs once instead of fifty times. The DOM is slow only when you ask it to be.

06 — Web security

Four boundaries that keep the web from being one big shared computer.

The browser executes JavaScript from arbitrary websites with no pre-arranged trust. You visit news.example.com and it runs a hundred kilobytes of code on your machine. The same browser tab might be logged into your bank in another window. The same machine might have access to your filesystem, your camera, your microphone. The fact that visiting a webpage does not automatically compromise everything else you care about is not magic; it is four specific, layered defences invented over thirty years in response to attacks that worked. This section walks through them in the order they were added.

The same-origin policy: the foundational wall

The oldest and most important browser security boundary is the same-origin policy, introduced in Netscape 2.0 (1996) — the same release that shipped JavaScript itself. The rule is conceptually simple: code loaded from one origin can read and modify only resources from that same origin. An origin is the triple (scheme, host, port). Two URLs share an origin if and only if all three match exactly. https://a.example.com and https://b.example.com are different origins. http://example.com and https://example.com are different origins. http://example.com:80 and http://example.com:8080 are different origins. Without this rule, JavaScript on any tab could read the cookies and DOM of any other tab — including your bank's.

Fig 12.8 — Same-origin policy · scheme + host + port must all match
Fig 12.8 — Same-origin policy · scheme + host + port must all match three components, all-or-nothing match REFERENCE ORIGIN https : // www.example.com : 443 ↑ scheme ↑ host ↑ port https://www.example.com:443/page ✓ same origin http://www.example.com:443/page ✗ scheme differs https://api.example.com:443/page ✗ host differs https://www.example.com:8443/page ✗ port differs CORS — Cross-Origin Resource Sharing — is the explicit opt-in mechanism for cross-origin requests when both sides agree.

The same-origin policy is the boundary every other browser security mechanism builds on top of. JavaScript on https://news.com cannot make an XHR/fetch to https://bank.com and read the response. It cannot read the DOM of an iframe pointing at https://bank.com. It cannot read cookies set by https://bank.com. There are explicit, opt-in escape hatches — CORS for cross-origin XHRs (the server can send Access-Control-Allow-Origin headers), postMessage for cross-origin iframe messaging, JSONP (deprecated) for cross-origin script loading. Everything else is firmly fenced off. This is the rule that makes it safe to have multiple tabs open on different sites at the same time.

Cross-site scripting (XSS): when the page itself is the attack

Same-origin policy says: code from origin A cannot touch origin B. But what if the attacker can inject their own code into origin B's page? Then their code is, by definition, same-origin with B — and can do everything B's legitimate code can do. This is cross-site scripting (XSS), the most common web vulnerability for two decades running. It comes in three flavours. Reflected XSS: the attacker crafts a URL with malicious script in the query string; the server echoes the query string back into the page; the script executes on whoever visits the URL. Stored XSS: the attacker submits malicious script as a comment or profile field; the server stores it in the database; every subsequent visitor's browser executes it. DOM-based XSS: a similar attack carried out entirely client-side via JavaScript that interpolates URL parameters into the DOM unsafely.

Fig 12.9 — XSS · three ways for attacker code to run with your origin's privileges
Fig 12.9 — XSS · three ways for attacker code to run with your origin's privileges three pipelines, three injection points, one outcome — attacker code in your origin REFLECTED — query string echoed into HTML ① attacker → victim: https://search.example.com/?q=<script>steal()</script> ② server response: <p>Results for: <script>steal()</script></p> ③ browser executes the injected script as same-origin with example.com defense: HTML-escape any user input echoed into HTML; never trust query strings STORED — payload persisted in database, served to every visitor ① attacker posts comment: Nice post! <script>steal()</script> ② server stores it · every later visitor's HTML contains the script ③ all subsequent visitors' browsers execute the script on every page view defense: same as reflected, plus output-encode at render time, not at storage time DOM-BASED — injection happens entirely in the browser, never touches the server ① page JS reads location.hash: document.body.innerHTML = location.hash; ② attacker's URL: https://example.com/#<img src=x onerror=steal()> ③ browser parses the hash as HTML, runs the onerror — server saw nothing suspicious defense: never assign untrusted strings to innerHTML; use textContent or DOM APIs that don't parse HTML

All three XSS variants share the same shape — attacker-controlled string ends up where the browser parses it as code — and all three are exploitable as long as user input flows into HTML, JavaScript, or attribute contexts without proper output encoding. The 2010s saw an industry-wide push toward template engines that escape by default (React's JSX, Vue's mustaches, Svelte's curly braces all encode by default), and toward tainted-string static analysis at build time. Even so, XSS still appears every year on the OWASP Top 10. The deeper fix — the one this section is heading toward — is Content Security Policy, which lets the server tell the browser "do not run inline scripts at all, and only run scripts from these specific origins." Even if an attacker injects a <script>, CSP prevents the browser from executing it.

CSRF: making your browser the attacker

Same-origin policy stops origin A's JavaScript from reading origin B's responses. It does not stop origin A from making requests to origin B. A page on evil.com can make a POST request to bank.com/transfer, and the browser will cheerfully attach bank.com's session cookie to the request. If the bank trusts cookies as the only authentication, the request succeeds — the attacker has just made the user transfer money from their authenticated session, even though the user only visited evil.com. This is cross-site request forgery (CSRF), and the standard defence is the CSRF token: the server includes a random unguessable token in the form HTML; the form submission must echo that token in a hidden field; the server verifies the token before processing. Since evil.com cannot read bank.com's pages (same-origin policy), evil.com cannot read the token, cannot forge a valid submission, and the attack fails.

Fig 12.10 — CSRF · the cookie attaches to the cross-site request unless we add a token
Fig 12.10 — CSRF · the cookie attaches to the cross-site request unless we add a token unguessable token in form ⇒ attacker can submit but cannot forge WITHOUT CSRF TOKEN — attack succeeds ① user logs into bank.com, browser stores session cookie ② user visits evil.com (in another tab, while bank session still valid) ③ evil.com runs: <form action="bank.com/transfer" method="POST">…</form> · auto-submit ④ browser sends POST with bank.com cookie attached ⑤ bank processes transfer · attacker won WITH CSRF TOKEN — attack fails ① bank.com renders form with hidden <input name="csrf" value="x9k2A…"> · token tied to session ② evil.com cannot read bank.com's HTML (same-origin policy blocks it) ③ evil.com submits a POST without the token (or with a guessed wrong token) ④ bank receives the POST, validates token against session — mismatch ⑤ bank rejects · 403 Forbidden · attack fails modern alternative: SameSite=Strict cookie attribute · cookie not sent on cross-site POST at all

CSRF tokens are an old defence (~2002); the modern complement is the SameSite cookie attribute, introduced in 2016 and now defaulted to Lax by every major browser. SameSite=Lax means: do not attach this cookie to cross-site POST requests; only to top-level navigations. SameSite=Strict goes further: do not attach the cookie even on cross-site GET. Together with CSRF tokens, they make the classical CSRF attack mostly obsolete on a properly configured site. The deeper lesson is that browser security has, over twenty years, moved from "defend at the application layer" (CSRF tokens, anti-XSS escaping) toward "defend at the platform layer" (SameSite, CSP, Trusted Types). The web's own primitives are getting safer; the application code matters less than it used to.

CSP: telling the browser which scripts you trust

The strongest mitigation in the modern web security stack is Content Security Policy (CSP), introduced 2010, standardised 2014, deployed everywhere by the late 2010s. CSP is a response header (Content-Security-Policy: …) in which the server declares, in a small declarative language, exactly which origins the browser is permitted to load resources from — separately for scripts, styles, images, frames, fonts, fetch destinations, and every other resource type. A typical strict policy: script-src 'self' https://cdn.example.com; style-src 'self' means: "execute scripts only from the same origin and from cdn.example.com; load styles only from the same origin; reject everything else." If an attacker manages to inject a <script src="evil.com/x.js">, the browser sees that evil.com is not in the policy and refuses to load the script. The XSS attack from Fig 12.9 is neutralised even though the injection succeeded.

Fig 12.11 — CSP · the server tells the browser which scripts to trust, the browser refuses the rest
Fig 12.11 — CSP · the server tells the browser which scripts to trust, the browser refuses the rest a short header that defangs most XSS even after injection succeeds SERVER RESPONSE HEADER Content-Security-Policy: default-src 'self'; script-src 'self' https://cdn.example.com; style-src 'self' 'unsafe-inline'; BROWSER ENFORCES — every resource load checked against policy <script src="/app.js"> ✓ self origin <script src="https://cdn.example.com/lib.js"> ✓ allow-listed <script src="https://evil.com/x.js"> ✗ not in policy <script>steal()</script> (inline) ✗ no 'unsafe-inline' eval("…") ✗ no 'unsafe-eval' CSP also reports violations to a configurable URL — XSS attempts become alerts in your monitoring system.

CSP is unique among the defences in this section: it makes the browser an active enforcer of an application-level policy. Even if attacker code is injected into the page (a server bug let it through; a third-party script was compromised; a stored-XSS payload made it past sanitisation), the browser refuses to execute anything not on the allow-list. Strict CSP — script-src 'self' with no 'unsafe-inline', all scripts in external files — eliminates entire classes of XSS. The cost is real: legacy code that uses inline event handlers (onclick="…") breaks; fast-loading inline scripts have to be moved out; third-party widgets need their origins enumerated. But the security gain has been enough to push every serious site toward strict CSP over the past decade. As of 2026, sites without CSP are the exception.

Closing the chapter, closing the part

JavaScript was designed in ten days by one person under the wrong brief, given an unrelated marketing name to ride a competitor's hype, shipped before specification, and exported from the browser to the server fourteen years after birth. The event loop it inherited from the early browser is now the dominant concurrency model on the modern web. The engines that run it are masterpieces of compiler engineering that exist only because the language is everywhere. The security mechanisms that surround it — same-origin policy, CSRF tokens, Content Security Policy, Trusted Types, Subresource Integrity — were retrofitted over thirty years in response to attacks that worked. None of this was inevitable. All of it became, by accumulation, the shape of the modern web.

Part III is now complete. We started with voltage on a wire (Chapter 8), promoted it into routed packets (Chapter 9), built reliability on top of unreliability (Chapter 10), turned the network into a worldwide library of trustworthy documents (Chapter 11), and made those documents alive (Chapter 12). The reader who has followed all five chapters now has, in principle, the ability to read every layer of an HTTPS request from voltage on copper to a piece of JavaScript modifying the DOM and rendering at 60 frames per second. Part IV picks up where Part III left off: where information lives at rest, and how it stays trustworthy when nobody is watching the connection. The relational database. The cryptographic primitives in mathematical depth. And the unified security chapter that ties together every attack we have foreshadowed in Parts I, II and III into a single working theory of why systems break and how we keep them standing.

End of Part III

The network is built.

Voltage to packets to TCP to HTTP to JavaScript. Five chapters, every layer from copper to pixels. Part IV turns to where information lives at rest — the relational database, real cryptography, and the unified security chapter that ties Parts I through III together.