Introduction

Kubernetes has become the de‑facto platform for running containerized workloads at scale. While most developers interact with the API server, pods, and services daily, the underlying networking layer remains a black box for many. Yet, a solid grasp of how Kubernetes networking works internally is essential for backend engineers who need to:

  • Diagnose connectivity issues quickly.
  • Design resilient multi‑tier applications.
  • Implement secure network policies.
  • Choose the right CNI plugin for their workload characteristics.

This guide dives deep into the internals of Kubernetes networking, covering everything from the Linux network namespace that isolates each pod to the sophisticated routing performed by kube-proxy. Along the way, you’ll find practical code snippets, YAML examples, and real‑world context that you can apply to production clusters today.


Core Concepts

Pods and IP Addresses

At the heart of Kubernetes networking is the pod—the smallest deployable unit. Each pod runs one or more containers sharing:

  • A network namespace (isolated network stack).
  • An IP address that is routable throughout the cluster.
  • A port space (all containers in a pod share the same ports).

Note: Pods are flat in the networking sense—there is no concept of “sub‑net per namespace” unless the chosen CNI implements it.

When a pod is created, the kubelet asks the configured Container Network Interface (CNI) plugin to allocate an IP address from the cluster’s pod CIDR (e.g., 10.244.0.0/16). The plugin also sets up a virtual Ethernet pair (veth) that bridges the pod’s network namespace to the node’s root namespace, enabling traffic to leave the host.

Services and Virtual IPs

A Service abstracts a set of pods behind a stable ClusterIP (virtual IP). Clients inside the cluster can reach the service via my-service.my-namespace.svc.cluster.local. The service object does not own a physical NIC; instead, it relies on kube-proxy to program rules that translate the virtual IP to the actual pod IPs.

Network Namespaces

Linux network namespaces provide the isolation that makes pod‑level networking possible. Each pod gets its own namespace, with its own:

  • IP address.
  • Routing table.
  • Firewall rules (iptables/nftables).

The node’s root namespace contains the bridge (or overlay device) that connects all pod veth pairs. This bridge is where inter‑pod traffic is switched or routed, depending on the CNI implementation.


The CNI (Container Network Interface) Model

CNI is a specification and runtime for configuring network interfaces in Linux containers. Kubernetes delegates the low‑level plumbing to CNI plugins, which can be broadly classified into two categories:

CategoryTypical Use‑CaseExample Plugins
OverlayClusters spanning multiple subnets or cloud regions; abstracts the underlying network.Flannel (VXLAN), Weave Net
Underlay / NativeHigh‑performance, low‑latency workloads; leverages existing L2/L3 fabric.Calico (BGP), Cilium (eBPF)

CNI Plugin Lifecycle

When a pod is scheduled, the kubelet invokes the CNI binary with a ADD command:

/opt/cni/bin/calico \
  ADD \
  /var/run/cni/netns/<pod-netns-id> \
  /etc/cni/net.d/10-calico.conflist

The plugin:

  1. Allocates an IP from its pool (or via IPAM).
  2. Creates a veth pair (veth0 in the pod, veth1 on the host).
  3. Attaches veth1 to a bridge or routes it via BGP.
  4. Optionally applies network policies (e.g., Calico’s policy engine).

On pod termination, the kubelet calls the DEL command, and the plugin tears down the interface and releases the IP.


Pod‑to‑Pod Communication

Overlay vs. Underlay

  • Overlay Networks (e.g., Flannel VXLAN) encapsulate pod traffic in a second tunnel header (VXLAN, Geneve). This allows pods on different nodes to communicate even if the underlying network does not provide direct routing for pod CIDRs.

  • Underlay Networks (e.g., Calico with BGP) advertise each node’s pod CIDR directly into the data‑center routing fabric. Traffic flows natively, avoiding encapsulation overhead.

IP Masquerading and NAT

Kubernetes performs masquerade (SNAT) for traffic that exits the cluster (e.g., to the internet). By default, the iptables chain POSTROUTING rewrites the source IP to the node’s primary IP for pods whose destination is not within the cluster CIDR.

iptables -t nat -A POSTROUTING -s 10.244.0.0/16 ! -d 10.244.0.0/16 -j MASQUERADE

Masquerading can be disabled per‑namespace or per‑pod using the IPMasqAgent or by annotating the pod with k8s.v1.cni.cncf.io/networks.


Service Types and Their Networking

Service TypeCluster‑Internal BehaviorExternal Exposure
ClusterIPVirtual IP reachable only inside the cluster.None
NodePortExposes the same ClusterIP on a static port of each node (30000‑32767).Direct node‑level access
LoadBalancerProvisions an external load balancer (cloud provider) that forwards to NodePorts.Cloud‑LB IP
ExternalNameMaps a service name to an external DNS name; no IP allocation.DNS CNAME

ClusterIP Implementation

When a client sends traffic to a ClusterIP, kube-proxy intercepts the packet (via iptables or IPVS) and performs DNAT to one of the backend pod IPs:

# iptables example (simplified)
-A KUBE-SVC-XXXXX -j KUBE-SEP-YYYYY
-A KUBE-SEP-YYYYY -s 10.244.1.5 -j DNAT --to-destination 10.244.2.7:8080

kube-proxy also maintains session affinity (if enabled) by persisting the chosen backend for a given client IP.

NodePort Flow

  1. External client → NodeIP:NodePort.
  2. kube-proxy DNATs to the ClusterIP.
  3. Step 2 repeats the standard ClusterIP flow.

LoadBalancer Flow

The cloud provider’s LB forwards traffic to the node’s NodePort, after which the same path as above is followed.


kube-proxy: How It Implements Service Routing

kube-proxy runs as a daemonset on every node. It watches the API server for Service and Endpoint objects and programs the node’s kernel accordingly. There are two primary modes:

iptables Mode

  • Stateless: each packet is processed by a chain of iptables rules.
  • Pros: No extra userspace process; low overhead.
  • Cons: Large rule sets can degrade performance at scale.

Example iptables Chains

# KUBE-SERVICES chain contains a rule for each Service
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "my-svc" -m recent --name KUBE-SEP-abc123 --rsource -j KUBE-SEP-abc123

# KUBE-SEP-abc123 performs DNAT to a random pod endpoint
-A KUBE-SEP-abc123 -m statistic --mode random --probability 0.3333333333 -j KUBE-SEP-1
-A KUBE-SEP-abc123 -m statistic --mode random --probability 0.5000000000 -j KUBE-SEP-2
-A KUBE-SEP-abc123 -j KUBE-SEP-3

IPVS Mode

  • Stateful: uses Linux IP Virtual Server (IPVS) to create virtual services and real servers.
  • Pros: Handles tens of thousands of services with minimal rule bloat; supports more load‑balancing algorithms.
  • Cons: Requires the ipvsadm package and kernel modules.

IPVS Configuration Example

# Create a virtual service for the ClusterIP
ipvsadm -A -t 10.96.0.1:80 -s rr

# Add real servers (pod IPs)
ipvsadm -a -t 10.96.0.1:80 -r 10.244.2.7:8080 -g
ipvsadm -a -t 10.96.0.1:80 -r 10.244.2.8:8080 -g

Both modes ultimately achieve the same result—mapping a stable virtual IP to a set of pod IPs—but IPVS is preferred for high‑throughput, large‑scale clusters.


DNS in Kubernetes

Kubernetes provides service discovery via an internal DNS server, usually CoreDNS. Each Service gets a DNS entry of the form:

<service-name>.<namespace>.svc.cluster.local

CoreDNS runs as a Deployment with a ConfigMap that determines the upstream resolvers and stub zones. When a pod queries my-service, the DNS request is resolved to the Service’s ClusterIP, after which the kube-proxy routing steps apply.

CoreDNS Configuration Snippet

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
            lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }

Network Policies

Network Policies let you enforce zero‑trust security at the pod level. They are implemented by the underlying CNI (e.g., Calico, Cilium) and translated into iptables or eBPF rules.

Example NetworkPolicy YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-allow-frontend
  namespace: prod
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

This policy allows only pods labeled app=frontend to reach app=backend on port 8080. All other traffic is dropped.

Tip: When using Calico, you can inspect the generated iptables rules with calicoctl get policy -o yaml.


Ingress Controllers and Service Meshes

Ingress Basics

An Ingress resource defines HTTP(S) routing rules for external traffic, but it requires an Ingress Controller (e.g., NGINX, HAProxy, Traefik) to implement those rules. The controller watches Ingress objects and configures a load balancer (often a NodePort + external LB) accordingly.

Minimal Ingress Example

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  namespace: prod
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80

Service Mesh (Istio) Networking

A service mesh injects a sidecar proxy (Envoy) into each pod, turning the mesh into a layer‑7 networking fabric. The mesh handles:

  • Fine‑grained traffic routing (canary, A/B testing).
  • Mutual TLS (mTLS) for pod‑to‑pod encryption.
  • Observability (metrics, tracing).

Istio’s control plane (istiod) distributes EnvoyFilter configurations, while the data plane proxies intercept all inbound/outbound traffic, effectively bypassing kube-proxy for intra‑mesh communication.


Observability and Troubleshooting

Essential Tools

ToolUse‑Case
kubectl exec -it <pod> -- shInspect container network namespace
nsenter (via netshoot image)Jump into host or pod netns
tcpdump -i anyCapture live traffic
ip netns listList namespaces (useful on node)
iptables -L -t nat -n -vVerify DNAT/SNAT rules
ipvsadm -LnInspect IPVS virtual services
calicoctl get policyView Calico network policies
cilium monitorReal‑time eBPF event stream

Debugging Workflow

  1. Validate DNSnslookup my-service.my-namespace.svc.cluster.local.
  2. Check Service Endpointskubectl get endpoints my-service -o yaml.
  3. Inspect iptables/IPVS – ensure DNAT rules exist for the Service IP.
  4. Trace the packet – use tcpdump on the node’s bridge (cbr0 or flannel.1).
  5. Confirm CNI statecalicoctl node status or cilium status.
  6. Review NetworkPolicy logs – some CNIs emit logs when a policy blocks traffic.

Real‑World Example: Deploying a Multi‑Tier Application

Consider a classic three‑tier web app:

  1. Frontend (React) → Service frontend.
  2. Backend API (Go) → Service api.
  3. Database (PostgreSQL) → Service postgres.

Deployment Manifests

# frontend-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  labels:
    app: frontend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: nginx
        image: nginx:stable-alpine
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: frontend
  labels:
    app: frontend
spec:
  selector:
    app: frontend
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: LoadBalancer
# api-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  labels:
    app: api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: myorg/api:1.2.3
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: api
  labels:
    app: api
spec:
  selector:
    app: api
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP
# postgres-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:13
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: pg-secret
              key: password
        ports:
        - containerPort: 5432
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  selector:
    app: postgres
  ports:
  - protocol: TCP
    port: 5432
    targetPort: 5432
  type: ClusterIP

Network Flow Description

  1. Client → Frontend: External traffic hits the cloud LB, which forwards to a node’s NodePort. kube-proxy DNATs to the frontend Service’s ClusterIP, then to one of the three NGINX pods.
  2. Frontend → API: The NGINX container resolves api via CoreDNS (api.default.svc.cluster.local → ClusterIP). The request is DNAT‑routed to one of the API pods.
  3. API → PostgreSQL: The Go process connects to postgres:5432. CoreDNS resolves the Service, kube-proxy routes to the single PostgreSQL pod.
  4. Response Path: Replies follow the reverse path, with NAT ensuring the source IP appears as the originating Service IP (unless externalTrafficPolicy: Local is set on the LoadBalancer).

Performance tip: If latency between API and DB is a concern, consider colocating them on the same node and using a hostNetwork: true pod for the DB (only for trusted workloads) to eliminate an extra hop.


Best Practices

  1. Plan Pod CIDRs Carefully

    • Avoid overlapping CIDRs across clusters or VPNs.
    • Reserve a /16 per cluster for future scaling.
  2. Choose the Right CNI for Your Workload

    • Overlay (Flannel, Weave) – Simple, works in most cloud VPCs.
    • BGP (Calico) – Low overhead, good for large clusters.
    • eBPF (Cilium) – High performance, advanced security (transparent encryption).
  3. Secure Ingress/Egress

    • Enforce NetworkPolicies at namespace boundaries.
    • Use PodSecurityPolicies (or the newer Pod Security Standards) to restrict privileged containers.
  4. Monitor kube-proxy Mode

    • For >10k Services, switch to IPVS (--proxy-mode=ipvs).
    • Verify kernel modules: lsmod | grep ip_vs.
  5. Leverage Service Mesh Sparingly

    • Meshes add latency and operational complexity. Deploy only where you need advanced traffic management or zero‑trust mTLS.
  6. Enable IP Masquerade Agent (if needed)

    • Prevent IP spoofing and ensure correct source IPs for external services.
  7. Regularly Audit DNS

    • Run coredns health checks and monitor query latency; DNS failures are a common cause of “service not reachable” alerts.

Conclusion

Kubernetes networking is a layered masterpiece: from the low‑level Linux namespaces that give each pod its own IP stack, through the CNI plugins that weave those namespaces into a coherent cluster‑wide fabric, up to the higher‑level abstractions of Services, Ingress, and Service Meshes. Understanding the internals—how kube-proxy programs iptables or IPVS, how CoreDNS resolves service names, and how network policies translate into kernel rules—empowers backend engineers to design resilient architectures, troubleshoot elusive connectivity bugs, and make informed decisions about CNI and security strategies.

By mastering these concepts, you’ll be able to:

  • Diagnose and fix network issues faster than ever.
  • Optimize performance by selecting the right networking stack.
  • Harden your workloads with fine‑grained policies.
  • Seamlessly integrate advanced traffic routing via Ingress or a Service Mesh.

Kubernetes continues to evolve, but the foundational networking principles covered here will remain relevant as the ecosystem matures. Keep experimenting, stay current with CNI releases, and always pair theory with hands‑on validation in a test cluster.


Resources

  1. Kubernetes Official Documentation – Networking
    https://kubernetes.io/docs/concepts/cluster-administration/networking/

  2. CNI Specification
    https://github.com/containernetworking/cni

  3. Calico Network Policies Guide
    https://projectcalico.org/docs/

  4. Cilium eBPF Networking
    https://cilium.io/

  5. IPVS – Linux Virtual Server
    https://www.linuxvirtualserver.org/software/ipvs.html

Feel free to dive into these resources to deepen your understanding, experiment with different CNI plugins, and stay up‑to‑date with the latest networking advances in Kubernetes. Happy clustering!