AI Infrastructure Engineering

Personal Engineering Lab · Casale Monferrato, IT

Understanding systems by building, measuring and documenting them.
Powered by caffeine and reboots.

GPU Node 32 GB RAM · RTX 4070 Super 12 GB VRAM · Dell PowerEdge R730 Dual Socket 128 GB RAM
vLLM · LMCache · GuideLLM · Prometheus · Grafana · Docker
4
Research Papers
5
Frameworks
3
Architecture References
2
Hardware Platforms

The Lab

Dielabs investigates how architectures shape workload behavior in AI systems. Focused on the layers that make inference a service — runtime, serving stack, and hardware & fabric — and the system behavior they produce, from latency and throughput dynamics to capacity under load.

I design, benchmark, and validate inference architectures to map performance, trade-offs, and real-world limits — and turn these findings into opinionated frameworks for day 0–1 decisions: sizing, capacity planning, and workload deployment.

Built on a background in enterprise datacenter and presales engineering, now fully focused on GPU-accelerated inference stacks and the operational discipline required to run them at scale.

Focus Areas

GPU Infrastructure

Homelab node, DCGM, driver stack

Inference Runtimes

vLLM internals, KV cache, batching

Benchmarking

GuideLLM sweeps, crossover analysis

Observability

Prometheus, Grafana, PromQL, DCGM

Memory Engineering

KV offload, LMCache, quantization

Distributed Inference

TP/DP patterns, NUMA, CPU-GPU isomorphism

Five Frameworks for the Inference Lifecycle

From a business need to a system in production: a framework set that turns LLM inference deployment into a sequence of named, defensible decisions — each anchored in lab evidence, not vendor slides.

Content & Docs

I design, test and deploy systems. AI accelerates the build.