Personal Engineering Lab · Casale Monferrato, IT
Understanding systems by building, measuring and documenting them.
Powered by caffeine and reboots.
About
Dielabs investigates how architectures shape workload behavior in AI systems. Focused on the layers that make inference a service — runtime, serving stack, and hardware & fabric — and the system behavior they produce, from latency and throughput dynamics to capacity under load.
I design, benchmark, and validate inference architectures to map performance, trade-offs, and real-world limits — and turn these findings into opinionated frameworks for day 0–1 decisions: sizing, capacity planning, and workload deployment.
Built on a background in enterprise datacenter and presales engineering, now fully focused on GPU-accelerated inference stacks and the operational discipline required to run them at scale.
What I work on
Homelab node, DCGM, driver stack
vLLM internals, KV cache, batching
GuideLLM sweeps, crossover analysis
Prometheus, Grafana, PromQL, DCGM
KV offload, LMCache, quantization
TP/DP patterns, NUMA, CPU-GPU isomorphism
Methodology
From a business need to a system in production: a framework set that turns LLM inference deployment into a sequence of named, defensible decisions — each anchored in lab evidence, not vendor slides.
Repository
I design, test and deploy systems. AI accelerates the build.