Introduction

Kubernetes has become the de‑facto platform for orchestrating containerized workloads, but its networking model is often perceived as a “black box.” Understanding how traffic moves inside a cluster is essential for developers who need to:

  • Debug connectivity issues quickly.
  • Design secure, multi‑tenant applications.
  • Integrate service meshes, API gateways, or custom load balancers.
  • Optimize performance and cost.

This guide dives deep into the internal mechanics of Kubernetes networking. We’ll explore the underlying concepts, the role of the Container Network Interface (CNI), how pods talk to each other, how services are implemented, and how network policies enforce security. Real‑world YAML examples and code snippets illustrate each concept, and a mini‑project demonstrates the ideas in practice.

Note: This article assumes you have a working Kubernetes cluster (v1.27+ recommended) and basic familiarity with pods, deployments, and services.


Table of Contents

  1. Core Networking Concepts
    1.1 Pods and IP Addresses
    1.2 Services Overview
    1.3 Network Policies
  2. The CNI Model
    2.1 CNI Plugin Types
    2.2 Lifecycle of a CNI Call
  3. Pod‑to‑Pod Communication
    3.1 Overlay vs. Underlay Networks
    3.3 Routing Inside the Cluster
  4. Service Implementation Details
    4.1 kube‑proxy Modes (iptables & IPVS)
    4.2 EndpointSlices vs Endpoints
  5. Ingress, Load Balancing, and Service Meshes
  6. DNS and Service Discovery
  7. Network Policy Enforcement Mechanics
  8. Troubleshooting Toolkit
  9. Real‑World Example: Multi‑Tier App with Policies
  10. Best Practices for Developers
  11. Conclusion
  12. Resources

Core Networking Concepts

Pods and IP Addresses

Every pod in Kubernetes receives a unique IP address from the cluster’s pod CIDR (e.g., 10.244.0.0/16). This address is assigned by the node’s CNI plugin and is routable cluster‑wide. The “IP‑per‑pod” model enables:

  • Flat networking – containers within the same pod share the network namespace; containers in different pods can reach each other directly via IP.
  • Port‑level isolation – a pod can expose any port without colliding with other pods because each pod has its own IP.

Example: Pod Manifest

apiVersion: v1
kind: Pod
metadata:
  name: hello-pod
spec:
  containers:
  - name: hello
    image: nginx:alpine
    ports:
    - containerPort: 80

When this pod is scheduled, the node’s CNI plugin will:

  1. Allocate an IP from the node’s pod CIDR.
  2. Create a veth pair (veth0 on the pod, veth1 on the host).
  3. Plug veth1 into the node’s bridge or routing table, depending on the plugin.

Services Overview

A Service is an abstraction that provides a stable virtual IP (ClusterIP) and DNS name for a set of pods. Kubernetes implements three primary Service types:

TypePurposeExposure
ClusterIPInternal load‑balancing inside the clusterOnly cluster‑internal
NodePortExposes the Service on each node’s IP + portExternal (via node IP)
LoadBalancerProvisions an external LB via cloud providerExternal (cloud)
ExternalNameMaps a Service to an external DNS nameDNS alias only

When a Service is created, the control plane creates an Endpoint (or EndpointSlice) object that lists the pod IPs backing the Service. The kube‑proxy component on each node programs the underlying networking layer (iptables or IPVS) to forward traffic from the Service IP to one of the pod IPs.

Example: Service Manifest (ClusterIP)

apiVersion: v1
kind: Service
metadata:
  name: hello-svc
spec:
  selector:
    app: hello
  ports:
  - protocol: TCP
    port: 80      # Service port
    targetPort: 80 # Pod containerPort
  type: ClusterIP

Network Policies

Network Policies let developers enforce pod‑level firewall rules. They are declarative, namespace‑scoped resources that specify allowed inbound/outbound traffic based on selectors, ports, and IP blocks.

Important: Network Policies are opt‑in – if none exist, all traffic is allowed.

Example: Allow HTTP from Frontend Pods Only

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-allow-frontend
  namespace: production
spec:
  podSelector:
    matchLabels:
      role: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend
    ports:
    - protocol: TCP
      port: 8080

The CNI Model

CNI Plugin Types

The Container Network Interface (CNI) is a pluggable framework that decouples Kubernetes from the underlying network implementation. Popular CNI plugins include:

PluginPrimary Use‑CaseArchitectureNotable Features
CalicoL3 routing, network policy, eBPFPure L3, optional BGPHigh performance, IPIP/VXLAN, eBPF dataplane
FlannelSimple overlay (VXLAN, host‑gw)OverlayEasy to set up, limited policy support
CiliumeBPF‑based networking & securityeBPF (XDP)Advanced visibility, L7 policies
Weave NetOverlay with automatic meshOverlay (Weave)Simplicity, built‑in DNS
Kube‑RouterL3 routing + network policyBGP + iptablesMinimal footprint

Each plugin implements two executable binaries (cni and cni-ipam) that Kubernetes calls during pod lifecycle events.

Lifecycle of a CNI Call

When the kubelet creates a pod, the following steps occur:

  1. Pod Scheduling – Scheduler assigns a node.
  2. CNI Add – kubelet invokes the plugin’s ADD command with a JSON payload containing:
    • Pod UID, name, namespace.
    • Network namespace path (/var/run/netns/<pid>).
    • Desired IP address range (if using static IPAM).
  3. Plugin Work – The plugin:
    • Allocates an IP (via its IPAM or external IPAM service).
    • Creates a veth pair and moves one end into the pod’s netns.
    • Configures routes/bridge rules on the host.
    • Optionally programs kernel eBPF maps or iptables for policy enforcement.
  4. Result – Plugin returns the pod IP and interface details to kubelet, which writes them to the pod’s status.

When a pod is deleted, the kubelet invokes the DEL command, prompting the plugin to clean up the veth pair and release the IP.


Pod‑to‑Pod Communication

Overlay vs. Underlay Networks

ApproachHow It WorksProsCons
Overlay (VXLAN, Geneve)Encapsulates pod traffic in UDP packets; each node runs a virtual tunnel endpoint (VTEP).Works on any L2 network; no BGP required.Extra encapsulation overhead; MTU considerations.
Underlay (L3 routing, BGP)Nodes exchange routes (via BGP or static) for each pod CIDR; traffic is routed directly.Low latency, no encapsulation; scales well.Requires routable L3 fabric; more complex network design.

Calico (in pure L3 mode) and Cilium (using eBPF) favor underlay, while Flannel’s default mode uses overlay.

Routing Inside the Cluster

Each node maintains a routing table that knows how to reach every pod CIDR. For underlay, the node runs a BGP daemon (e.g., bird) that advertises its pod CIDR to peers. For overlay, the node’s bridge (cni0 or flannel.1) handles encapsulation.

Example ip route on a node (Calico L3):

$ ip route show table main
10.244.0.0/16 dev cali12345 proto kernel scope link src 10.244.0.5
10.244.1.0/24 via 192.168.1.2 dev eth0  # route to another node's pod CIDR
...

kube‑proxy and Service IP Translation

kube‑proxy runs on every node and watches Service objects. In iptables mode, it creates a chain of NAT rules:

# Example iptables snippet for a ClusterIP Service (10.96.0.10)
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m tcp --dport 80 -j KUBE-SVC-XYZ123
-A KUBE-SVC-XYZ123 -m comment --comment "svc: hello-svc" -j KUBE-SEP-ABC456
-A KUBE-SEP-ABC456 -m comment --comment "ep: hello-pod-1" -j DNAT --to-destination 10.244.0.5:80

In IPVS mode, kube‑proxy creates an IPVS virtual server that forwards traffic to real servers (pod IPs) using efficient load‑balancing algorithms (RR, LC, etc.).


Service Implementation Details

kube‑proxy Modes (iptables & IPVS)

ModePerformanceComplexityWhen to Use
iptablesSimple, works everywhere, moderate performance (NAT per packet).Limited load‑balancing algorithms.Small clusters, low traffic, compatibility.
IPVSKernel‑level load balancing (L4) – higher throughput, lower latency.Requires ipvsadm and kernel modules.Medium‑to‑large clusters, high QPS workloads.
userspace (deprecated)Proxy runs in userspace – lowest performance.Rarely used now.Legacy environments.

Switching to IPVS is as easy as:

# Enable IPVS mode on the kube-proxy ConfigMap
kubectl -n kube-system edit configmap/kube-proxy
# set mode: "ipvs"

Service IP Allocation

Kubernetes reserves a service CIDR (e.g., 10.96.0.0/12) for ClusterIP addresses. The kube-apiserver allocates a Service IP from this range on Service creation. The Service IP is virtual – it never exists on any physical interface; traffic to it is intercepted by kube‑proxy.

Endpoints vs. EndpointSlices

Kubernetes originally used Endpoints objects, which list all pod IPs for a Service. As clusters grow, this becomes a scalability bottleneck. EndpointSlices (alpha in v1.16, GA in v1.21) split the list into smaller slices (up to 100 endpoints each) and are more efficient for watch traffic.

Sample EndpointSlice (generated automatically)

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: hello-svc-xyz
  labels:
    kubernetes.io/service-name: hello-svc
addressType: IPv4
ports:
- name: http
  protocol: TCP
  port: 80
endpoints:
- addresses:
  - 10.244.0.5
  conditions:
    ready: true
- addresses:
  - 10.244.0.8
  conditions:
    ready: true

Ingress, Load Balancing, and Service Meshes

Ingress Controllers

An Ingress resource defines HTTP(S) routing rules (host/path) that a controller implements. Popular controllers include:

  • NGINX Ingress Controller – Configurable, widely used.
  • Traefik – Dynamic, supports middleware.
  • Istio IngressGateway – Part of a service mesh, offers richer L7 features.

When an Ingress object is applied, the controller creates a LoadBalancer Service (or NodePort) that fronts a set of Ingress Pods. These pods then route traffic to backend Services using the same Service IP mechanism described earlier.

Minimal Ingress Example (NGINX)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: demo.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-svc
            port:
              number: 80

Service Mesh Integration

Service meshes (e.g., Istio, Linkerd, Consul Connect) deploy a sidecar proxy (Envoy or Linkerd2-proxy) alongside every pod. The mesh intercepts all inbound/outbound traffic via iptables redirect (REDIRECT to 127.0.0.1:<proxy-port>). This enables:

  • Zero‑trust mTLS between services.
  • Fine‑grained L7 routing and retries.
  • Telemetry (metrics, tracing).

The mesh’s control plane configures each sidecar’s routing table via Envoy’s xDS APIs.


DNS and Service Discovery

Kubernetes runs CoreDNS as a cluster‑internal DNS server. Pods resolve Service names automatically:

$ nslookup hello-svc.default.svc.cluster.local
Name:   hello-svc.default.svc.cluster.local
Address: 10.96.0.10

CoreDNS also supports stub domains, external name resolution, and custom plugins (e.g., k8s_external). The DNS zone svc.cluster.local maps Service names to ClusterIP addresses, while pod-ip resolution is handled via the kube-dns plugin.


Network Policy Enforcement Mechanics

Network policies are enforced by the CNI plugin, not by the Kubernetes API server. Implementation differs:

PluginEnforcement Technique
Calicoiptables (legacy) or eBPF (newer) – each policy compiles to a set of rules that match pod selectors and ports.
CiliumeBPF programs attached to the socket or XDP layer – offers L3/L4/L7 enforcement with minimal overhead.
Kube‑Routeriptables – straightforward but less performant at scale.

Example: Calico iptables for the Policy Above

# Generated by calico-node
-A CNI-INPUT -s 10.244.0.0/16 -d 10.244.1.0/24 -p tcp -m multiport --dports 8080 -j ACCEPT
-A CNI-INPUT -j DROP

When a pod attempts to connect to a backend pod on port 8080, the packet traverses the host’s iptables chain CNI-INPUT. If the source IP matches a pod labeled role=frontend, the rule accepts; otherwise, the final DROP rule blocks the traffic.


Troubleshooting Toolkit

ToolUse‑CaseExample Command
kubectl execRun network utilities inside a podkubectl exec -it busybox -- ping 10.244.0.5
Netshoot (debug pod)Full suite of networking toolskubectl run netshoot --image=nicolaka/netshoot --restart=Never -- sleep 3600
kubectl port-forwardAccess a pod locally for debuggingkubectl port-forward pod/hello-pod 8080:80
iptables -L -t nat -n -vInspect kube‑proxy NAT rulessudo iptables -t nat -L KUBE-SERVICES -n -v
ipvsadm -LnView IPVS virtual serverssudo ipvsadm -Ln
cilium monitorObserve eBPF events (Cilium)cilium monitor -t flow
calicoctl get policy -o wideList active Calico policiescalicoctl get networkpolicy -o wide
tcpdump -i anyCapture traffic on the nodesudo tcpdump -i any port 80

Common debugging flow:

  1. Validate DNSnslookup <service> inside the pod.
  2. Check endpoint visibilitykubectl get endpoints <svc> or kubectl get epSlice.
  3. Inspect iptables/IPVS – ensure NAT rules exist for the Service IP.
  4. Test connectivitycurl http://<svc-ip> or curl http://<svc-name>.
  5. Review NetworkPolicy – confirm no deny rules block the traffic.

Real‑World Example: Multi‑Tier App with Policies

We’ll deploy a simple three‑tier web application:

  • frontend – React static files served by Nginx.
  • backend – Go API exposing /api.
  • db – MySQL database.

We’ll enforce that only the backend can talk to the DB, and only the frontend can talk to the backend.

Step 1: Namespace Setup

apiVersion: v1
kind: Namespace
metadata:
  name: demo-app

Step 2: Deploy Backend Service & Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: demo-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: api
        image: ghcr.io/example/go-api:1.0
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: backend-svc
  namespace: demo-app
spec:
  selector:
    app: backend
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP

Step 3: Deploy Frontend

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  namespace: demo-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: web
        image: nginx:alpine
        volumeMounts:
        - name: static
          mountPath: /usr/share/nginx/html
        ports:
        - containerPort: 80
      volumes:
      - name: static
        configMap:
          name: frontend-html
---
apiVersion: v1
kind: Service
metadata:
  name: frontend-svc
  namespace: demo-app
spec:
  selector:
    app: frontend
  ports:
  - port: 80
    targetPort: 80
  type: ClusterIP

Step 4: Deploy MySQL

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: demo-app
spec:
  serviceName: mysql
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: secret
        ports:
        - containerPort: 3306
---
apiVersion: v1
kind: Service
metadata:
  name: mysql
  namespace: demo-app
spec:
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
  type: ClusterIP

Step 5: Apply Network Policies

Allow Frontend → Backend (HTTP)

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: demo-app
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 80

Allow Backend → MySQL (MySQL Port)

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-backend-to-mysql
  namespace: demo-app
spec:
  podSelector:
    matchLabels:
      app: mysql
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: backend
    ports:
    - protocol: TCP
      port: 3306

Default Deny (Optional)

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: demo-app
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Verification

  1. Frontend → Backendcurl http://backend-svc.demo-app.svc.cluster.local/api from a frontend pod should succeed.
  2. Backend → MySQLmysql -h mysql.demo-app.svc.cluster.local -u root -psecret from backend pod works.
  3. Frontend → MySQL – Attempt should be blocked (connection reset).

This end‑to‑end example showcases how Service discovery, kube‑proxy, and NetworkPolicy interplay to enforce a secure, layered architecture.


Best Practices for Developers

AreaRecommendation
Pod CIDR PlanningUse non‑overlapping CIDRs per node (e.g., 10.244.0.0/16 per node) to simplify routing.
CNI SelectionChoose underlay (Calico, Cilium) for performance‑critical workloads; overlay (Flannel) for simplicity in test clusters.
Service TypePrefer ClusterIP + Ingress for HTTP traffic; use NodePort only when external LB is unavailable.
IPVS over iptablesEnable IPVS for high‑throughput services; monitor ipvsadm health.
NetworkPolicy GranularityStart with a default deny policy and add explicit allow rules; use labels to keep policies readable.
ObservabilityDeploy cilium-monitor, kube‑proxy metrics, and CoreDNS health checks to detect routing failures early.
TestingInclude connectivity tests in CI pipelines (kubectl exec + curl/nc) to catch policy regressions.
DocumentationKeep YAML manifests version‑controlled; annotate policies with comments explaining intent.

Conclusion

Kubernetes networking is a layered system that blends L3 routing, L4 load balancing, service discovery, and L7 policy enforcement into a cohesive whole. By understanding the roles of the CNI plugin, kube‑proxy, Service objects, and NetworkPolicies, developers can:

  • Build reliable, secure micro‑service architectures.
  • Diagnose and resolve connectivity issues with confidence.
  • Optimize performance by selecting the right proxy mode and CNI implementation.
  • Integrate advanced features like service meshes without losing visibility into the underlying traffic flow.

Armed with the architectural insights and practical examples presented here, you’re now prepared to design, implement, and troubleshoot Kubernetes networking at scale.


Resources