
jul 4, 2026
One box, eight GPUs, and the wires between them
Eight GPUs in one server behave like a small network. NVLink vs PCIe, reading nvidia-smi topo -m, NCCL transports, the ACS trap, and fitting a 70B model.
I do platform work at a startup. This is where I write down the things I had to figure out the hard way, so future-me at 3am doesn't have to.


SDR, a Flipper, a Pwnagotchi, a few antennas I tuned in bad lighting, and an hourglass for measuring how long a deploy actually takes.
See the lair →

jul 4, 2026
Eight GPUs in one server behave like a small network. NVLink vs PCIe, reading nvidia-smi topo -m, NCCL transports, the ACS trap, and fitting a 70B model.

jul 2, 2026
A GPU pod sits on a dozen layers from silicon to scheduler, and each one fails its own way. Drivers, the container toolkit, MIG, DCGM, and the metrics…

may 20, 2026
Ten years of git, distilled to the daily eight, an fzf branch picker, and the weekly pruning ritual.

may 15, 2026
Excalidraw is fast, but everything I make in it looks the same. Seven tools that promise visuals with attitude, one diagram, three I'd keep.
Long-form infra writing, every few weeks. No tracking pixels, no marketing sequences, no LinkedIn-isms.
Harshit Luthra. Senior SRE, infrequent essayist, occasional source of production incidents. More about me →
4M+
Daily active users
1M/min
Peak throughput
99.99%
Uptime on 95% spot
$200K
Cloud spend trimmed
What I work in
kubernetes · terraform · aws · next.js · typescript · postgres
23 posts · 11 TILs · writing here since 2018