LLM Quantization Gallery - Arpit Singh Gautam

The LLM Quantization Gallery is an interactive tool for exploring and comparing quantization techniques for large language models. It provides a hands-on way to understand how different quantization methods affect model quality, size, and inference efficiency.

Quantization is one of the most impactful techniques for deploying LLMs efficiently - reducing memory footprint and accelerating inference by representing weights at lower bit precisions. This gallery makes those trade-offs concrete and navigable.

What It Covers

Comparison of quantization methods (INT8, INT4, GPTQ, AWQ, GGUF, and more)
Quality metrics (perplexity, benchmark scores) across bit widths
Model size and memory footprint comparisons
Inference speed benchmarks across hardware configurations

Background

This project grew out of my work at Dell Technologies (CSG CTO Lab), where I developed an RL-based quantization framework for Post-Training Quantization in LLMs - achieving 2.6× compression over baseline methods with minimal perplexity loss. The gallery is a public-facing complement to that research, making quantization trade-offs accessible to practitioners.

Open the Gallery