Blog

Writing on LLMs, inference optimization, research, and ideas.

Posts

LLM Quantization Gallery: Making Sense of How We Shrink Large Language Models

INT4, INT8, GPTQ, AWQ, QLoRA — a practical breakdown of the quantization methods reshaping how we deploy LLMs, plus the story behind building an interactive gallery to explore them.

Apr 2026 · 8 min read · LLMs · Quantization · Systems

TurboQuant: How Google Compresses the KV Cache by 6× Without Losing Accuracy

How Google compresses the KV cache by 6× with zero accuracy loss — random rotation, per-element quantization, and QJL residual correction explained. Presented at ICLR 2026.

2026 · 10 min read · Quantization · KV Cache · LLM Inference

ARC-AGI-3: Why This Benchmark Is Different From Everything Else

Program synthesis, test-time training, fluid intelligence — what makes ARC-AGI the hardest benchmark in AI, and my early experiments competing in it with a random agent baseline.

2026 · 7 min read · AGI · Reasoning · Benchmarks

PaperBanana & PaperReviewer: Can AI Agents Actually Help Academic Research?

Google's two new research agents — one for automated figure generation, one for AI-assisted peer review. What they get right, what they miss, and why novelty evaluation is still a hard problem.

2026 · 8 min read · AI Tools · Research · Agents

More posts coming soon. Follow @Asg_Wolverine for updates.