The voltage on a transistor extends to the voltage on a continent.
Part I closed inside a CPU. Part III leaves the CPU. The same voltage transition that switched a transistor in Chapter 1 — a small step from one level to another, interpreted by everything downstream as "the bit changed" — now travels: across the motherboard, out a network interface, down a cable, across an ocean, into another machine on the other side of the planet. The physics is unchanged. What grows is the discipline required to keep the signal intact over that distance.
A wire, electrically, is just a conductor with two ends. Push voltage in at
one; some attenuated, slightly delayed version of that voltage shows up at
the other. The delay is a function of length and cable type — about
5 ns per metre on copper, slightly faster on fibre. The
attenuation is a function of frequency, distance, and material. The noise
added along the way is a function of everything else in the universe nearby:
lightning, fluorescent lights, microwave ovens, other wires, cosmic rays. A
one-metre USB cable inside a desk hides all of this. A two-thousand-kilometre
undersea cable cannot.
The story of the network is, at its core, the engineering of scaling this single phenomenon. How do you keep a voltage signal recognisable after a kilometre of copper? After a kilometre of fibre? After a hundred kilometres of either? After a wireless link through a forest in the rain? The answers come from physics, mathematics, and a great deal of practitioner cleverness. The voltage on a wire is the substrate; everything else in the next five chapters is what we build on top of it.
A bit travelling between two transistors inside a CPU and a bit travelling between New York and London are the same physical phenomenon — a voltage transition propagating along a conductor — observed at scales twelve orders of magnitude apart. The CPU version completes in roughly a nanosecond and is barely degraded. The transatlantic version takes about 28 milliseconds, passes through dozens of optical amplifiers, and arrives noticeably degraded — but still recognisable as the original bit pattern. Everything in this chapter is the engineering that makes the second version possible.
Electrons in copper, photons in glass, waves through air.
All networking happens over one of three substrates. Copper carries electrons. Fibre carries photons. Wireless carries electromagnetic waves through space. They look very different. They all do the same thing: transport bits as variations in some physical quantity that someone at the other end can measure.
Copper is the original. A pair of conductors, twisted together to cancel out external electromagnetic interference (this is what "twisted pair" means in Cat 6 twisted pair). The signal is voltage between the two wires. Cheap to manufacture, easy to terminate, but attenuates fast — by about 30 dB per kilometre at the frequencies modern Ethernet uses, which limits a single copper run to roughly 100 metres before a switch or repeater is required. Almost every cable inside a house or office is copper.
Fibre is glass. The signal is light, pulsed at frequencies around 200 THz. The light is launched into a thin glass core surrounded by a cladding with a slightly lower refractive index, so total internal reflection traps the light inside the core all the way to the receiver. Attenuation is dramatically lower — modern fibres lose about 0.2 dB per kilometre, allowing tens to hundreds of kilometres between repeaters. Every undersea cable, every long-distance internet trunk, every data-centre backbone is fibre. The first transatlantic fibre cable, TAT-8, went into service in 1988 and carried 280 megabits per second — at the time, more than the combined capacity of every previous transatlantic cable in history.
Wireless is the absence of any medium at all — bits as variations in an electromagnetic wave propagating through space. Wi-Fi runs around 2.4 GHz and 5 GHz; cellular at 700 MHz to several GHz; satellite up to tens of GHz. The trade is fundamental: no wire to install, but every receiver in range hears every transmission, the medium is shared with every microwave oven and every Bluetooth device, and atmospheric absorption is a factor at the higher frequencies. The mathematics of how to share that single medium efficiently is the bulk of cellular and Wi-Fi design.
The substrate is the first attack surface. Each of these three media leaks differently. Copper wires emit small amounts of electromagnetic radiation that a receiver nearby can demodulate back into the original signal — the basis of TEMPEST, a US classification programme dating to the 1960s for shielding sensitive equipment against precisely this. (TEMPEST attacks against unshielded VGA cables can be performed with off-the-shelf radios from across a room; against modern equipment, from across a parking lot.) Optical fibres are harder but not immune: bending a fibre slightly leaks a small fraction of its light through the cladding, and the Snowden documents in 2013 revealed that the NSA and GCHQ had been systematically tapping undersea fibre cables at landing stations for years. Wireless is hardest of all to secure: every transmission goes to every receiver in range. The whole story of Wi-Fi security — WEP (Wired Equivalent Privacy, 1999, broken within a year), WPA (2003, broken 2008), WPA2 (2004, broken 2017 by the KRACK attack), WPA3 (2018, still standing) — is the engineering response to the fact that on a radio link, the attacker is always already in the room.
A request from your laptop to a server in Tokyo crosses all three substrates — Wi-Fi from laptop to router, copper or fibre from router to ISP, fibre across the Pacific, more fibre into the destination data centre, finally copper or fibre to the server's NIC. Each hop is a different combination of physical medium and encoding. The application layer never sees any of it; it sees a TCP socket, which sees an IP route, which sees a series of link-layer frames, which sees, finally, a stream of voltage transitions or photon pulses or radio symbols. The whole stack exists to abstract away which substrate is in use at any moment.
1948. Bell Labs again. The number behind every wire.
In July and October of 1948, Claude Shannon — a thirty-two-year-old mathematician at Bell Labs — published "A Mathematical Theory of Communication" in two parts of the Bell System Technical Journal. It is the founding document of information as a quantifiable thing. Before this paper, "amount of information" was a metaphor. After it, information had units (bits) and a formula (entropy). And Shannon proved a result that still defines every modern network: every channel has a maximum bit rate, you can send reliably below it, and you cannot send reliably above it.
Shannon himself is one of the strangest figures in twentieth-century science, and worth knowing. He grew up in Gaylord, Michigan, the son of a probate judge and a high-school principal. He arrived at MIT in 1936 as a graduate student and, in a master's thesis written the next year, did something that no one had thought to do: he showed that the ideas of George Boole's nineteenth-century algebra of logic could be implemented as electrical relay circuits, and that any logical proposition could therefore be computed by a network of switches. Howard Gardner later called this "possibly the most important master's thesis of the twentieth century." Shannon was twenty-one. The thesis is the bridge between Chapter 2 of this book and everything that followed it: it is the moment Boolean logic stopped being philosophy and started being engineering.
He stayed strange. At Bell Labs through the 1940s, where he wrote the information theory paper essentially on his own (the whole field sprang from one head, in one paper, more or less complete), he was known for riding a unicycle through the corridors while juggling. He built an electromechanical mouse named Theseus that could solve a maze and remember the solution — arguably the first artificial learning machine, 1950. He later built rocket-powered pogo sticks, flame-throwing trumpets, and a chess-playing automaton. The papers he wrote in his "spare time" included the first paper on computer chess, the first paper on cryptography as information theory (declassified 1949 — it had been classified during the war), and a proof that the Rubik's cube has a maximum solving distance bounded by some specific number. He worked alone, refused most academic politics, and was difficult to interview. He died in 2001, having lived to see the entire networked world he had given the mathematics to. There is no later figure as singular.
The key definition in his 1948 paper, called entropy after the thermodynamics quantity it resembles, measures how much "surprise" — how much information — is contained in a probability distribution. For a discrete source emitting symbols with probabilities p₁, p₂, …, pₙ, the entropy is:
H = −Σ pᵢ log₂(pᵢ)
Measured in bits per symbol. A fair coin (½, ½) has H = 1 bit per flip — maximum surprise. A coin that always lands heads (1, 0) has H = 0 — no information. A biased coin (¾, ¼) has H ≈ 0.81 bits — between the two. The deeper insight is that this number is also the average code length any compression scheme can achieve, asymptotically. Shannon proved you can compress a source down to its entropy, and no further. ZIP, gzip, JPEG, MP3, H.264, every modern compressor — all sit somewhere on Shannon's curve, fighting for the last bit.
Entropy of a binary source as the probability of "1" varies from 0 to 1. At the extremes (always 0 or always 1) there is no surprise — every bit is predictable, so each symbol carries zero information. At p = ½ — a fair coin — every flip is maximally surprising and carries one full bit of information. The curve is symmetric: a 70% / 30% source carries the same entropy as a 30% / 70% source, since "uncertainty" doesn't care which side you bet on. This single curve is the lower bound on lossless compression for any binary source. Compress a fair coin and you cannot do better than one bit per flip. Compress a 90/10 source and the theoretical minimum is about 0.47 bits per flip — actually achievable by arithmetic coding to within a few percent.
Shannon's second result, the noisy channel coding theorem, put a number on what a wire can carry. Given a channel of bandwidth B hertz and signal-to-noise ratio SNR, the maximum reliable bit rate — the channel capacity — is:
C = B · log₂(1 + SNR)
If you transmit at rate R < C, there exists a coding scheme that achieves arbitrarily low error rate. If R > C, there is no such scheme — errors are inevitable, no matter how clever the encoding. The proof is non-constructive: Shannon proved good codes exist, decades before anyone knew how to build them. Modern error-correcting codes (LDPC, Polar, Turbo) approach within a fraction of a dB of Shannon's limit. The race to close that gap is the story of half of late-twentieth-century communications engineering.
Shannon's channel capacity curve: the maximum number of bits per second per Hertz of bandwidth as a function of signal-to-noise ratio. A 56k phone modem operated at roughly 30 dB SNR over a 4 kHz channel; the math gives 4000 × 10 ≈ 40 kbps, close to what was achieved in practice. Modern Wi-Fi 6 at ~20 dB SNR over 80 MHz channels achieves several hundred Mbps. Coherent fibre at ~15 dB effective SNR over tens of GHz of bandwidth pushes terabits per second on a single strand. The curve says nothing about how to achieve these rates — only that they cannot be exceeded. Every modulation, coding, and equalisation technique invented since 1948 is, in the end, a different way to climb closer to this single curve.
How a one-or-zero rides on a continuous wave.
Knowing the channel can carry ten million bits per second tells you nothing about how to actually represent those bits as voltage, light, or radio. Encoding is the layer that maps abstract ones and zeros onto physical waveforms the receiver can decode. Different encodings exist because they trade different properties: clock recovery, bandwidth efficiency, robustness to noise, ease of synchronisation. None is optimal everywhere; each is right somewhere.
The most naive encoding is NRZ — Non-Return to Zero. High voltage means 1; low voltage means 0. The signal sits at one of two levels and only changes when a new bit arrives. It is dead simple. It also has a fatal weakness: a long run of identical bits looks like a flat line. The receiver, reading bits at some clock rate, has no way to know when one bit ends and the next begins. Drift even slightly out of sync and the entire stream is corrupted from there onward.
Manchester encoding solves the synchronisation problem by embedding the clock in the data. Every bit is split into two halves; a transition in the middle carries the bit value (high-to-low for one polarity, low-to-high for the other). There is a transition in every bit period, no matter what the data is. The receiver locks onto those transitions and stays in sync indefinitely. The cost is double bandwidth: every bit needs two half-bit slots. Original Ethernet (10 Mbps over coax) used Manchester encoding for exactly this reason — it could not afford to lose synchronisation on a long run of zeros.
Manchester encoding represents each bit as a transition in the middle of a fixed time slot. A high-to-low transition is one value (often "1"); a low-to-high is the other. Because there is always a transition mid-bit, the receiver can lock its clock onto the data stream regardless of how many identical bits arrive in a row. The cost is that the signal alternates twice as fast as the underlying bit rate, requiring double the bandwidth — a 10 Mbps Ethernet stream actually carries 20 million signal transitions per second. This is why Ethernet over Manchester encoding maxed out at 10 Mbps over typical Cat 3 twisted pair; the next jump to 100 Mbps required moving to a more efficient encoding (4B5B over Cat 5) plus clever line coding.
The same bit stream — 1 0 0 0 0 1 1 — encoded two ways. In NRZ, four consecutive zeros produce four identical time slots of low voltage; nothing in the signal tells the receiver where one zero ends and the next begins. If the receiver's clock drifts even slightly, the count is lost. In Manchester, every bit is a transition, so the receiver re-syncs its clock on every bit period regardless of the data. The trade is bandwidth: Manchester doubles the rate of signal changes per second. Modern high-speed encodings (8B/10B, 64B/66B) combine the bandwidth efficiency of NRZ with periodic forced transitions to maintain synchronisation — the best of both worlds, at the cost of a small overhead in encoded bits.
When the channel cannot carry baseband signals — radio, DSL, cable modems, optical fibre with multiple wavelengths — the bits ride on a carrier wave, a high-frequency sinusoid that the medium can transport. Modulation is the process of varying some property of the carrier in step with the data: amplitude (AM), frequency (FM), or phase (PSK). More elaborate schemes combine these — QAM (Quadrature Amplitude Modulation) varies amplitude and phase simultaneously, packing multiple bits per symbol.
Three ways to map digital bits onto a sinusoidal carrier. AM changes the amplitude — easy to see on an oscilloscope, vulnerable to noise (which mostly affects amplitude). FM changes the frequency — invented by Edwin Armstrong in 1933 specifically because it tolerates amplitude noise; this is why FM radio sounds clean during thunderstorms while AM does not. PSK changes the phase — robust, efficient, and the basis of every modern digital radio. Real-world systems compose these: QAM changes amplitude and phase together to pack 4, 6, 8, even 10 bits into a single symbol. Wi-Fi 6 uses 1024-QAM (10 bits per symbol) under good conditions, falling back to 64-QAM or QPSK as the channel degrades.
Metcalfe, 1973: many computers, one cable.
In 1973, Bob Metcalfe at Xerox PARC invented Ethernet as part of his doctoral work. The original idea was elegant in its plainness: any number of computers share one cable; they all listen; whenever no one else is talking, anyone can send; if two start at the same time and the signals collide, both detect it, both back off for a random interval, and both try again. Carrier Sense, Multiple Access, with Collision Detection — CSMA/CD. That single idea, wrapped in increasingly fast versions of itself, became the dominant local-area networking technology on the planet.
Metcalfe arrived at the idea by an unusual route. His Harvard PhD thesis on packet networking had been rejected — the committee felt it lacked theoretical depth — and the rejection sent him into a foul mood and onto a flight to Hawaii, where he had heard about a packet-radio network called ALOHAnet built by Norm Abramson at the University of Hawaii. ALOHAnet let the islands' campuses share one radio frequency by transmitting whenever they had data and retransmitting after random delays when collisions occurred. Metcalfe analysed the math on the plane home, realised Abramson had badly under-estimated the achievable utilization of the protocol with smarter timing, wrote the corrections into a revised thesis, and got his doctorate. He then took the same insight, improved with carrier sensing ("listen before transmit"), and applied it to a wire instead of radio. That was Ethernet. The thesis Harvard rejected became, with the addition of practical engineering, the protocol that connected most offices in the world.
The Xerox PARC of 1973 was, in retrospect, the densest concentration of computing talent ever assembled in one place. Down the hall from Metcalfe, Alan Kay's group was inventing the personal computer (the Alto, 1973 — the machine that taught Steve Jobs what a graphical interface was), Adele Goldberg and Dan Ingalls were building Smalltalk, Charles Simonyi was writing what would become Microsoft Word, and Butler Lampson was designing the laser printer. Metcalfe's job was to network these machines together. His original Ethernet sketch is one of the artefacts in the Computer History Museum: a hand-drawn diagram on a piece of yellow paper, the date "May 22, 1973" in the corner, the basic architecture of every wired LAN since.
The original 10BASE5 Ethernet ran at 10 megabits per second over a single thick coaxial cable up to 500 metres long — the "thicknet" of the late 1970s. By the 1990s, switches replaced the shared cable; each device got its own dedicated cable to the switch, and collisions stopped happening because there was no one else on the wire. The collision-detection machinery is now mostly vestigial. But every Ethernet frame still carries the same header structure invented for that shared cable — including the famous 48-bit MAC addresses every network card has.
"Networking is inter-galactic."
— Bob Metcalfe, on a whiteboard at Xerox PARC, 1973Two stations sense an idle wire and start sending almost simultaneously. The signals collide somewhere along the cable. Both stations detect the collision (the voltage on the wire goes higher than either alone could produce), both stop, both broadcast a brief JAM signal so any third party knows to discard the partial frame, and both pick a random delay before retrying. The randomness is critical — if both used the same delay, they would collide again immediately. Ethernet uses truncated binary exponential backoff: after the k-th consecutive collision, each station waits a random number of slot-times in the range [0, 2k−1]. The expected wait grows exponentially with congestion, so heavily-contended networks self-throttle. With switched Ethernet, this whole machinery is dormant — but it is still in every NIC's firmware, and it is what makes Wi-Fi (which still has shared-medium collisions) work.
Every Ethernet device has a MAC address — a 48-bit
identifier burned into the network card at manufacture. The first 24 bits
are the OUI (Organizationally Unique Identifier),
assigned to the manufacturer by the IEEE; the next 24 are the
device-specific portion. The MAC address tells you who built the network
card. Apple devices begin 00:1A:11 and a few hundred other
prefixes; Cisco's most common OUIs are 00:0A:41 and dozens
more; Raspberry Pi devices begin B8:27:EB. There are public
databases mapping every OUI to its owner.
MAC addresses identify devices on the local link. IP addresses, which
we'll meet in Chapter 9, identify devices on the global network.
ARP — Address Resolution Protocol — is the mechanism
that bridges the two. When a host wants to send a packet to
192.168.1.5 on its own subnet, it broadcasts a question to
every device on the link: "Who has 192.168.1.5? Tell me your MAC." The
owner replies; the asker caches the answer for a few minutes; subsequent
packets to that IP go straight to the matching MAC.
A MAC address splits cleanly into manufacturer (OUI) and device-specific halves. The OUI B8:27:EB is owned by the Raspberry Pi Foundation, so this address is on a Pi. ARP keeps a small cache mapping known IP addresses on the local subnet to their MAC addresses. When a host wants to send to an IP not in the cache, it broadcasts an ARP request ("who has 192.168.1.5?"); the owner replies with a unicast ARP response. Entries expire after a few minutes so that disconnected devices don't accumulate. ARP has no authentication — anyone on the link can claim any IP, which is the basis of ARP spoofing (Chapter 15) — but on a properly switched modern LAN, the threat is contained because spoofing requires being on the local segment to begin with.
A hub is a dumb electrical repeater: every signal that arrives on any port is rebroadcast on every other port. Four hosts on a hub means every frame is heard by all four; collisions are routine; bandwidth is shared. A switch is intelligent: it observes the source MAC of every frame that arrives, learns which MAC sits on which port, and forwards each subsequent frame only to the port leading to its destination. Other ports stay quiet. Modern Ethernet is universally switched, with each cable a private link between one device and one port. The collision-detection machinery still exists in every NIC for compatibility, but on switched links it never fires. This single change — hub to switch, late 1990s — is what made Ethernet scale from megabits to gigabits to terabits per second.
MAC flooding — when a switch forgets it is a switch. A switch's MAC table has finite size — usually a few thousand entries on consumer hardware, tens of thousands on enterprise gear. An attacker with access to one port can rapidly send frames with a million different fabricated source MACs. The switch dutifully records each one, fills the table to capacity, and falls back to what it does when it does not know which port a destination MAC sits on: broadcast to every port. The switch has been downgraded to a hub. Every frame on the LAN is now visible to the attacker. This attack — called MAC flooding or sometimes CAM-table overflow — was demonstrated by Mike Beekey in 2000 and is the reason serious networks deploy port security (sticky MAC, max-MACs-per-port) on every access switch. It is also a useful illustration of a recurring pattern in networking: the polite, table-based, learning-by-listening designs of the 1990s assumed every participant played fair, and an attacker who declined to play fair could turn cooperative infrastructure against its users.
Why every textbook draws the same cake.
The reason network engineers can talk about network problems precisely is the layer model. Each layer does one thing, and only one thing. Each layer is a customer of the layer below and a provider for the layer above. Once you've drawn the layers, every protocol falls into exactly one slot, and every conversation about networking becomes "which layer is this happening at?"
The canonical reference is the OSI seven-layer model, standardised by the International Organization for Standardization in 1984. Bottom to top: Physical, Data Link, Network, Transport, Session, Presentation, Application. Each layer is named after the abstraction it provides. Physical moves bits over a medium. Data Link moves frames between adjacent nodes. Network moves packets across a global graph of nodes. Transport moves byte streams reliably or messages unreliably end to end. Session, Presentation, and Application are about what the bytes mean.
The TCP/IP four-layer model is the one that actually runs the internet. It collapses Physical and Data Link into a single "Link" layer (since the IP layer doesn't care which kind of link is below it), and collapses Session, Presentation, and Application into a single "Application" layer (since real applications usually handle their own session semantics). The result is shorter, more honest, and describes deployed reality better — but the OSI model remains the teaching reference because its separations are pedagogically cleaner.
Each OSI layer is named after the abstraction it provides to the layer above. Physical turns bits into voltage / light / radio. Data Link bundles bits into frames addressed by MAC and shipped across a single hop. Network bundles frames into packets that find their way across the global graph by IP address. Transport turns packets into reliable byte streams (TCP) or messages (UDP). Session manages connection setup and teardown. Presentation handles data formatting, encryption, compression. Application is where the real protocol lives — HTTP, SSH, DNS. In practice the top three layers blur together — most application protocols handle their own session and presentation logic — which is why the simpler TCP/IP four-layer model is what's actually deployed.
The seven OSI layers map to the four TCP/IP layers in a clear way. The bottom two (Physical + Data Link) collapse into TCP/IP's Link layer because, from IP's perspective, anything that delivers a frame to the next hop is good enough. The top three (Session + Presentation + Application) collapse into TCP/IP's Application layer because in practice every real application protocol handles its own session and formatting — HTTP, for instance, does its own connection management, its own content negotiation, its own compression. Use OSI when you need to be precise about which layer a feature belongs to. Use TCP/IP when you're describing what's actually on the wire.
The seam to Chapter 9
Chapter 8 has been about the physical and link layers — how a single bit makes it across one hop. Chapter 9 climbs to the network layer. Up to now, every diagram has assumed two devices on the same wire, or at most a few devices on the same Ethernet segment. The internet is something else: hundreds of millions of devices, on thousands of independently-administered networks, none of them aware of most of the others, and packets that need to find their way across that graph from any source to any destination. The idea that made this work was small, strange, and explicitly Cold War. It is the subject of the next chapter.