Introduction
Kubernetes has become the de‑facto platform for orchestrating containerized workloads, but its networking model is often perceived as a “black box.” Understanding how traffic moves inside a cluster is essential for developers who need to:
- Debug connectivity issues quickly.
- Design secure, multi‑tenant applications.
- Integrate service meshes, API gateways, or custom load balancers.
- Optimize performance and cost.
This guide dives deep into the internal mechanics of Kubernetes networking. We’ll explore the underlying concepts, the role of the Container Network Interface (CNI), how pods talk to each other, how services are implemented, and how network policies enforce security. Real‑world YAML examples and code snippets illustrate each concept, and a mini‑project demonstrates the ideas in practice.
Note: This article assumes you have a working Kubernetes cluster (v1.27+ recommended) and basic familiarity with pods, deployments, and services.
Table of Contents
- Core Networking Concepts
1.1 Pods and IP Addresses
1.2 Services Overview
1.3 Network Policies - The CNI Model
2.1 CNI Plugin Types
2.2 Lifecycle of a CNI Call - Pod‑to‑Pod Communication
3.1 Overlay vs. Underlay Networks
3.3 Routing Inside the Cluster - Service Implementation Details
4.1 kube‑proxy Modes (iptables & IPVS)
4.2 EndpointSlices vs Endpoints - Ingress, Load Balancing, and Service Meshes
- DNS and Service Discovery
- Network Policy Enforcement Mechanics
- Troubleshooting Toolkit
- Real‑World Example: Multi‑Tier App with Policies
- Best Practices for Developers
- Conclusion
- Resources
Core Networking Concepts
Pods and IP Addresses
Every pod in Kubernetes receives a unique IP address from the cluster’s pod CIDR (e.g., 10.244.0.0/16). This address is assigned by the node’s CNI plugin and is routable cluster‑wide. The “IP‑per‑pod” model enables:
- Flat networking – containers within the same pod share the network namespace; containers in different pods can reach each other directly via IP.
- Port‑level isolation – a pod can expose any port without colliding with other pods because each pod has its own IP.
Example: Pod Manifest
apiVersion: v1
kind: Pod
metadata:
name: hello-pod
spec:
containers:
- name: hello
image: nginx:alpine
ports:
- containerPort: 80
When this pod is scheduled, the node’s CNI plugin will:
- Allocate an IP from the node’s pod CIDR.
- Create a veth pair (
veth0on the pod,veth1on the host). - Plug
veth1into the node’s bridge or routing table, depending on the plugin.
Services Overview
A Service is an abstraction that provides a stable virtual IP (ClusterIP) and DNS name for a set of pods. Kubernetes implements three primary Service types:
| Type | Purpose | Exposure |
|---|---|---|
| ClusterIP | Internal load‑balancing inside the cluster | Only cluster‑internal |
| NodePort | Exposes the Service on each node’s IP + port | External (via node IP) |
| LoadBalancer | Provisions an external LB via cloud provider | External (cloud) |
| ExternalName | Maps a Service to an external DNS name | DNS alias only |
When a Service is created, the control plane creates an Endpoint (or EndpointSlice) object that lists the pod IPs backing the Service. The kube‑proxy component on each node programs the underlying networking layer (iptables or IPVS) to forward traffic from the Service IP to one of the pod IPs.
Example: Service Manifest (ClusterIP)
apiVersion: v1
kind: Service
metadata:
name: hello-svc
spec:
selector:
app: hello
ports:
- protocol: TCP
port: 80 # Service port
targetPort: 80 # Pod containerPort
type: ClusterIP
Network Policies
Network Policies let developers enforce pod‑level firewall rules. They are declarative, namespace‑scoped resources that specify allowed inbound/outbound traffic based on selectors, ports, and IP blocks.
Important: Network Policies are opt‑in – if none exist, all traffic is allowed.
Example: Allow HTTP from Frontend Pods Only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-allow-frontend
namespace: production
spec:
podSelector:
matchLabels:
role: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
role: frontend
ports:
- protocol: TCP
port: 8080
The CNI Model
CNI Plugin Types
The Container Network Interface (CNI) is a pluggable framework that decouples Kubernetes from the underlying network implementation. Popular CNI plugins include:
| Plugin | Primary Use‑Case | Architecture | Notable Features |
|---|---|---|---|
| Calico | L3 routing, network policy, eBPF | Pure L3, optional BGP | High performance, IPIP/VXLAN, eBPF dataplane |
| Flannel | Simple overlay (VXLAN, host‑gw) | Overlay | Easy to set up, limited policy support |
| Cilium | eBPF‑based networking & security | eBPF (XDP) | Advanced visibility, L7 policies |
| Weave Net | Overlay with automatic mesh | Overlay (Weave) | Simplicity, built‑in DNS |
| Kube‑Router | L3 routing + network policy | BGP + iptables | Minimal footprint |
Each plugin implements two executable binaries (cni and cni-ipam) that Kubernetes calls during pod lifecycle events.
Lifecycle of a CNI Call
When the kubelet creates a pod, the following steps occur:
- Pod Scheduling – Scheduler assigns a node.
- CNI Add – kubelet invokes the plugin’s
ADDcommand with a JSON payload containing:- Pod UID, name, namespace.
- Network namespace path (
/var/run/netns/<pid>). - Desired IP address range (if using static IPAM).
- Plugin Work – The plugin:
- Allocates an IP (via its IPAM or external IPAM service).
- Creates a veth pair and moves one end into the pod’s netns.
- Configures routes/bridge rules on the host.
- Optionally programs kernel eBPF maps or iptables for policy enforcement.
- Result – Plugin returns the pod IP and interface details to kubelet, which writes them to the pod’s status.
When a pod is deleted, the kubelet invokes the DEL command, prompting the plugin to clean up the veth pair and release the IP.
Pod‑to‑Pod Communication
Overlay vs. Underlay Networks
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| Overlay (VXLAN, Geneve) | Encapsulates pod traffic in UDP packets; each node runs a virtual tunnel endpoint (VTEP). | Works on any L2 network; no BGP required. | Extra encapsulation overhead; MTU considerations. |
| Underlay (L3 routing, BGP) | Nodes exchange routes (via BGP or static) for each pod CIDR; traffic is routed directly. | Low latency, no encapsulation; scales well. | Requires routable L3 fabric; more complex network design. |
Calico (in pure L3 mode) and Cilium (using eBPF) favor underlay, while Flannel’s default mode uses overlay.
Routing Inside the Cluster
Each node maintains a routing table that knows how to reach every pod CIDR. For underlay, the node runs a BGP daemon (e.g., bird) that advertises its pod CIDR to peers. For overlay, the node’s bridge (cni0 or flannel.1) handles encapsulation.
Example ip route on a node (Calico L3):
$ ip route show table main
10.244.0.0/16 dev cali12345 proto kernel scope link src 10.244.0.5
10.244.1.0/24 via 192.168.1.2 dev eth0 # route to another node's pod CIDR
...
kube‑proxy and Service IP Translation
kube‑proxy runs on every node and watches Service objects. In iptables mode, it creates a chain of NAT rules:
# Example iptables snippet for a ClusterIP Service (10.96.0.10)
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m tcp --dport 80 -j KUBE-SVC-XYZ123
-A KUBE-SVC-XYZ123 -m comment --comment "svc: hello-svc" -j KUBE-SEP-ABC456
-A KUBE-SEP-ABC456 -m comment --comment "ep: hello-pod-1" -j DNAT --to-destination 10.244.0.5:80
In IPVS mode, kube‑proxy creates an IPVS virtual server that forwards traffic to real servers (pod IPs) using efficient load‑balancing algorithms (RR, LC, etc.).
Service Implementation Details
kube‑proxy Modes (iptables & IPVS)
| Mode | Performance | Complexity | When to Use |
|---|---|---|---|
| iptables | Simple, works everywhere, moderate performance (NAT per packet). | Limited load‑balancing algorithms. | Small clusters, low traffic, compatibility. |
| IPVS | Kernel‑level load balancing (L4) – higher throughput, lower latency. | Requires ipvsadm and kernel modules. | Medium‑to‑large clusters, high QPS workloads. |
| userspace (deprecated) | Proxy runs in userspace – lowest performance. | Rarely used now. | Legacy environments. |
Switching to IPVS is as easy as:
# Enable IPVS mode on the kube-proxy ConfigMap
kubectl -n kube-system edit configmap/kube-proxy
# set mode: "ipvs"
Service IP Allocation
Kubernetes reserves a service CIDR (e.g., 10.96.0.0/12) for ClusterIP addresses. The kube-apiserver allocates a Service IP from this range on Service creation. The Service IP is virtual – it never exists on any physical interface; traffic to it is intercepted by kube‑proxy.
Endpoints vs. EndpointSlices
Kubernetes originally used Endpoints objects, which list all pod IPs for a Service. As clusters grow, this becomes a scalability bottleneck. EndpointSlices (alpha in v1.16, GA in v1.21) split the list into smaller slices (up to 100 endpoints each) and are more efficient for watch traffic.
Sample EndpointSlice (generated automatically)
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: hello-svc-xyz
labels:
kubernetes.io/service-name: hello-svc
addressType: IPv4
ports:
- name: http
protocol: TCP
port: 80
endpoints:
- addresses:
- 10.244.0.5
conditions:
ready: true
- addresses:
- 10.244.0.8
conditions:
ready: true
Ingress, Load Balancing, and Service Meshes
Ingress Controllers
An Ingress resource defines HTTP(S) routing rules (host/path) that a controller implements. Popular controllers include:
- NGINX Ingress Controller – Configurable, widely used.
- Traefik – Dynamic, supports middleware.
- Istio IngressGateway – Part of a service mesh, offers richer L7 features.
When an Ingress object is applied, the controller creates a LoadBalancer Service (or NodePort) that fronts a set of Ingress Pods. These pods then route traffic to backend Services using the same Service IP mechanism described earlier.
Minimal Ingress Example (NGINX)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: demo.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-svc
port:
number: 80
Service Mesh Integration
Service meshes (e.g., Istio, Linkerd, Consul Connect) deploy a sidecar proxy (Envoy or Linkerd2-proxy) alongside every pod. The mesh intercepts all inbound/outbound traffic via iptables redirect (REDIRECT to 127.0.0.1:<proxy-port>). This enables:
- Zero‑trust mTLS between services.
- Fine‑grained L7 routing and retries.
- Telemetry (metrics, tracing).
The mesh’s control plane configures each sidecar’s routing table via Envoy’s xDS APIs.
DNS and Service Discovery
Kubernetes runs CoreDNS as a cluster‑internal DNS server. Pods resolve Service names automatically:
$ nslookup hello-svc.default.svc.cluster.local
Name: hello-svc.default.svc.cluster.local
Address: 10.96.0.10
CoreDNS also supports stub domains, external name resolution, and custom plugins (e.g., k8s_external). The DNS zone svc.cluster.local maps Service names to ClusterIP addresses, while pod-ip resolution is handled via the kube-dns plugin.
Network Policy Enforcement Mechanics
Network policies are enforced by the CNI plugin, not by the Kubernetes API server. Implementation differs:
| Plugin | Enforcement Technique |
|---|---|
| Calico | iptables (legacy) or eBPF (newer) – each policy compiles to a set of rules that match pod selectors and ports. |
| Cilium | eBPF programs attached to the socket or XDP layer – offers L3/L4/L7 enforcement with minimal overhead. |
| Kube‑Router | iptables – straightforward but less performant at scale. |
Example: Calico iptables for the Policy Above
# Generated by calico-node
-A CNI-INPUT -s 10.244.0.0/16 -d 10.244.1.0/24 -p tcp -m multiport --dports 8080 -j ACCEPT
-A CNI-INPUT -j DROP
When a pod attempts to connect to a backend pod on port 8080, the packet traverses the host’s iptables chain CNI-INPUT. If the source IP matches a pod labeled role=frontend, the rule accepts; otherwise, the final DROP rule blocks the traffic.
Troubleshooting Toolkit
| Tool | Use‑Case | Example Command |
|---|---|---|
kubectl exec | Run network utilities inside a pod | kubectl exec -it busybox -- ping 10.244.0.5 |
| Netshoot (debug pod) | Full suite of networking tools | kubectl run netshoot --image=nicolaka/netshoot --restart=Never -- sleep 3600 |
kubectl port-forward | Access a pod locally for debugging | kubectl port-forward pod/hello-pod 8080:80 |
iptables -L -t nat -n -v | Inspect kube‑proxy NAT rules | sudo iptables -t nat -L KUBE-SERVICES -n -v |
ipvsadm -Ln | View IPVS virtual servers | sudo ipvsadm -Ln |
cilium monitor | Observe eBPF events (Cilium) | cilium monitor -t flow |
calicoctl get policy -o wide | List active Calico policies | calicoctl get networkpolicy -o wide |
tcpdump -i any | Capture traffic on the node | sudo tcpdump -i any port 80 |
Common debugging flow:
- Validate DNS –
nslookup <service>inside the pod. - Check endpoint visibility –
kubectl get endpoints <svc>orkubectl get epSlice. - Inspect iptables/IPVS – ensure NAT rules exist for the Service IP.
- Test connectivity –
curl http://<svc-ip>orcurl http://<svc-name>. - Review NetworkPolicy – confirm no deny rules block the traffic.
Real‑World Example: Multi‑Tier App with Policies
We’ll deploy a simple three‑tier web application:
- frontend – React static files served by Nginx.
- backend – Go API exposing
/api. - db – MySQL database.
We’ll enforce that only the backend can talk to the DB, and only the frontend can talk to the backend.
Step 1: Namespace Setup
apiVersion: v1
kind: Namespace
metadata:
name: demo-app
Step 2: Deploy Backend Service & Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
namespace: demo-app
spec:
replicas: 3
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
containers:
- name: api
image: ghcr.io/example/go-api:1.0
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: backend-svc
namespace: demo-app
spec:
selector:
app: backend
ports:
- port: 80
targetPort: 8080
type: ClusterIP
Step 3: Deploy Frontend
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
namespace: demo-app
spec:
replicas: 2
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: web
image: nginx:alpine
volumeMounts:
- name: static
mountPath: /usr/share/nginx/html
ports:
- containerPort: 80
volumes:
- name: static
configMap:
name: frontend-html
---
apiVersion: v1
kind: Service
metadata:
name: frontend-svc
namespace: demo-app
spec:
selector:
app: frontend
ports:
- port: 80
targetPort: 80
type: ClusterIP
Step 4: Deploy MySQL
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: demo-app
spec:
serviceName: mysql
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8
env:
- name: MYSQL_ROOT_PASSWORD
value: secret
ports:
- containerPort: 3306
---
apiVersion: v1
kind: Service
metadata:
name: mysql
namespace: demo-app
spec:
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306
type: ClusterIP
Step 5: Apply Network Policies
Allow Frontend → Backend (HTTP)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: demo-app
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 80
Allow Backend → MySQL (MySQL Port)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-backend-to-mysql
namespace: demo-app
spec:
podSelector:
matchLabels:
app: mysql
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: backend
ports:
- protocol: TCP
port: 3306
Default Deny (Optional)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: demo-app
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Verification
- Frontend → Backend –
curl http://backend-svc.demo-app.svc.cluster.local/apifrom a frontend pod should succeed. - Backend → MySQL –
mysql -h mysql.demo-app.svc.cluster.local -u root -psecretfrom backend pod works. - Frontend → MySQL – Attempt should be blocked (connection reset).
This end‑to‑end example showcases how Service discovery, kube‑proxy, and NetworkPolicy interplay to enforce a secure, layered architecture.
Best Practices for Developers
| Area | Recommendation |
|---|---|
| Pod CIDR Planning | Use non‑overlapping CIDRs per node (e.g., 10.244.0.0/16 per node) to simplify routing. |
| CNI Selection | Choose underlay (Calico, Cilium) for performance‑critical workloads; overlay (Flannel) for simplicity in test clusters. |
| Service Type | Prefer ClusterIP + Ingress for HTTP traffic; use NodePort only when external LB is unavailable. |
| IPVS over iptables | Enable IPVS for high‑throughput services; monitor ipvsadm health. |
| NetworkPolicy Granularity | Start with a default deny policy and add explicit allow rules; use labels to keep policies readable. |
| Observability | Deploy cilium-monitor, kube‑proxy metrics, and CoreDNS health checks to detect routing failures early. |
| Testing | Include connectivity tests in CI pipelines (kubectl exec + curl/nc) to catch policy regressions. |
| Documentation | Keep YAML manifests version‑controlled; annotate policies with comments explaining intent. |
Conclusion
Kubernetes networking is a layered system that blends L3 routing, L4 load balancing, service discovery, and L7 policy enforcement into a cohesive whole. By understanding the roles of the CNI plugin, kube‑proxy, Service objects, and NetworkPolicies, developers can:
- Build reliable, secure micro‑service architectures.
- Diagnose and resolve connectivity issues with confidence.
- Optimize performance by selecting the right proxy mode and CNI implementation.
- Integrate advanced features like service meshes without losing visibility into the underlying traffic flow.
Armed with the architectural insights and practical examples presented here, you’re now prepared to design, implement, and troubleshoot Kubernetes networking at scale.