Kubernetes networking is often considered the “final boss” for system architects. While the platform abstracts away much of the complexity of container orchestration, the underlying networking model is a sophisticated web of IPAM, virtual interfaces, routing tables, and netfilter rules.

Understanding how a packet travels from a user’s browser to a container deep within your cluster is essential for building scalable, secure, and resilient systems. In this guide, we will peel back the layers of the Kubernetes networking stack.

1. The Foundation: The Kubernetes Network Model

Before diving into the “how,” we must understand the “what.” Kubernetes operates on a set of fundamental constraints known as the Kubernetes Network Model. Every compliant cluster must adhere to these three rules:

  1. Pod-to-Pod Communication: Every Pod can communicate with every other Pod in the cluster without Network Address Translation (NAT).
  2. Node-to-Pod Communication: Agents on a node (like the Kubelet) can communicate with all Pods on that node.
  3. Shared Identity: A Pod sees its own IP as the same IP that others see for it.

This “IP-per-Pod” model simplifies port management. Unlike traditional Docker where you might map host port 8080 to container port 80, Kubernetes Pods behave like physical hosts or virtual machines on a flat network.

2. Layer 1: Container-to-Container Networking

Technically, a Pod is not a single container but a group of containers sharing the same Linux namespaces. The most critical for networking is the Network Namespace.

When a Pod starts, a “Pause” container is created first. It holds the network namespace. All other containers in the Pod join this namespace.

  • They share the same IP address.
  • They share the same localhost.
  • They must use different ports to avoid conflicts.

3. Layer 2: Pod-to-Pod Networking (Same Node)

How does traffic move between two Pods on the same worker node? This is handled via Virtual Ethernet (veth) pairs and a Bridge.

  1. Each Pod has an interface (usually eth0) inside its namespace.
  2. This is connected to a veth pair, with the other end residing in the Node’s root namespace.
  3. The host-side ends are connected to a virtual bridge (like cbr0 or docker0).

When Pod A sends a packet to Pod B, the bridge acts as a virtual switch, looking up the MAC address and forwarding the packet to the correct veth interface.

4. Layer 3: Pod-to-Pod Networking (Across Nodes)

This is where things get interesting. Since Pods on different nodes need to talk to each other, the cluster needs a way to route traffic across the physical (or cloud) network. This is the job of the Container Network Interface (CNI).

There are two primary ways CNIs handle this:

Encapsulation (Overlay)

Technologies like VXLAN or Geneve wrap the Pod-to-Pod packet inside a standard UDP host-to-host packet.

  • Pros: Works on almost any network.
  • Cons: Slight performance overhead due to encapsulation headers.
  • Examples: Flannel, Calico (VXLAN mode), Weave.

Direct Routing (Underlay)

The CNI updates the network’s routing tables (often via BGP) so the physical network knows that “Pod Range X is reachable via Node Y.”

  • Pros: High performance, no overhead.
  • Cons: Requires the underlying network to support custom routing or BGP.
  • Examples: Calico (BGP mode), Cilium, AWS/GCP VPC CNI.

5. Layer 4: The Magic of Services and Kube-Proxy

Pod IPs are ephemeral. If a Pod dies, it gets a new IP. We need a stable entry point: the Service.

The Service IP (ClusterIP) is virtual; it does not exist on any interface. Instead, kube-proxy manages the redirection.

IPTables Mode

The most common mode. Kube-proxy creates rules in the Linux kernel’s iptables that intercept traffic destined for a Service IP and randomly select a Pod back-end to forward it to.

IPVS Mode

For large-scale clusters (thousands of services), iptables becomes slow. IPVS (IP Virtual Server) is a kernel-based load balancer that uses hash tables, providing much better performance and more load-balancing algorithms (like Least Connection).

eBPF (The Modern Way)

Tools like Cilium bypass kube-proxy and iptables entirely. They use eBPF programs attached to the kernel’s socket layer to route packets directly, offering massive performance gains and deep observability.

6. Layer 7: Ingress and Gateway API

To get traffic from the outside world into the cluster, we use Ingress Controllers (like NGINX, HAProxy, or Traefik).

  • Ingress: A set of rules that allow external access to services, typically via HTTP/HTTPS.
  • Gateway API: The successor to Ingress, providing a more expressive, role-oriented, and extensible way to manage service networking.

7. Summary Table: The Networking Stack

LevelComponentResponsibility
Podveth pair / Pause containerLocal connectivity and IP sharing
NodeLinux Bridge / CNIMoving packets between Pods on the host
ClusterOverlay (VXLAN) or BGPCross-node communication
Servicekube-proxy / iptablesLoad balancing and stable entry points
EdgeIngress / Gateway APIExternal traffic and TLS termination

Conclusion

Mastering Kubernetes networking requires moving beyond the YAML and understanding the Linux kernel primitives that make abstraction possible. Whether you are troubleshooting a 503 error or designing a high-throughput financial system, knowing how veth pairs, CNI plugins, and kube-proxy interact will give you the edge as a system architect.

As you continue your journey, I recommend experimenting with different CNI providers in a lab environment and using tools like tcpdump and ip route on your worker nodes to see the traffic in action.

Resources for Further Learning