← Projects

LLM Quantization Gallery

LLMQuantizationModel CompressionInferenceInteractive

The LLM Quantization Gallery is an interactive tool for exploring and comparing quantization techniques for large language models. It provides a hands-on way to understand how different quantization methods affect model quality, size, and inference efficiency.

Quantization is one of the most impactful techniques for deploying LLMs efficiently — reducing memory footprint and accelerating inference by representing weights at lower bit precisions. This gallery makes those trade-offs concrete and navigable.

What It Covers

Background

This project grew out of my work at Dell Technologies (CSG CTO Lab), where I developed an RL-based quantization framework for Post-Training Quantization in LLMs — achieving 2.6× compression over baseline methods with minimal perplexity loss. The gallery is a public-facing complement to that research, making quantization trade-offs accessible to practitioners.

Open the Gallery