A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
This repository is tracked by Trending Repos. The badge upgrades automatically if it ever cracks the top 100.
<img src="https://trending-repos.com/badge/NVIDIA/Model-Optimizer.svg" alt="Trending Repos" />https://trending-repos.com/badge/NVIDIA/Model-Optimizer.svg