W8A8/W4A8 inference + optimized SDPA on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, plus FlashInfer-inspired GQA decode attention for up to 1.6× SDPA speedup, built as MLX custom primitives.
This repository is tracked by Trending Repos. The badge upgrades automatically the moment it cracks the top 100 — set it once and forget it.
[](https://trending-repos.com/repositories/Mininglamp-AI/cider)<a href="https://trending-repos.com/repositories/Mininglamp-AI/cider"><img src="https://trending-repos.com/badge/Mininglamp-AI/cider.svg" alt="Trending Repos" /></a>https://trending-repos.com/badge/Mininglamp-AI/cider.svg