martinuke0's Blog

A laptop screen displaying a GPU shader visualizing quantized tensors.

Implementing WebGPU-Accelerated Quantization: A Deep Dive into High-Performance Local LLaMA Inference

A step‑by‑step guide that shows engineers how to combine WebGPU shaders with LLaMA’s GGML backend to achieve low‑latency, high‑throughput inference on a laptop GPU.

Illustration of Go runtime threads stealing work from each other.

Mastering the Go Work-Stealing Scheduler: Architecture, Goroutine Management, and Production Performance Patterns

A deep dive into Go’s work‑stealing runtime, practical goroutine management techniques, and production‑ready performance patterns.

Diagram of Linux cgroups v2 hierarchy with resource controllers.

Mastering Cgroups v2 Resource Isolation: A Deep Dive into Effective Linux Control Groups

A practical guide that walks you through cgroups v2 hierarchy, CPU, memory, and I/O controllers, and production‑ready patterns for resource isolation.

Diagram of Linux cgroups hierarchy with CPU, memory, and I/O controllers.

Mastering Linux cgroups v2 Resource Isolation: Implementation, Control Groups, and Production Performance Tuning

A deep dive into cgroups v2 architecture, practical commands, and performance‑tuning tricks you can apply today to keep containers and services well‑behaved in production.

A Linux kernel symbol table overlaid with a glowing eBPF map.

Implementing eBPF for Tracing and Production Observability: Architecture, Performance, and Real-World Patterns

A deep dive into eBPF‑based tracing, showing how to design a production‑grade observability stack, measure its impact, and apply battle‑tested patterns.