Mininglamp-AI/cider

W8A8/W4A8 inference + optimized SDPA on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, plus FlashInfer-inspired GQA decode attention for up to 1.6× SDPA speedup, built as MLX custom primitives.

Python426 stars25 forksMIT

apple-siliconmetalmlxquantizationw4a8w8a8

Rankings

Daily

#5587

Weekly

#9480

Monthly

—

Engagement

Total stars

Daily new stars

Get badge

Add a Trending Repos badge to your README

This repository is tracked by Trending Repos. The badge upgrades automatically the moment it cracks the top 100 — set it once and forget it.

Markdown — paste into your README

[![Trending Repos](https://trending-repos.com/badge/Mininglamp-AI/cider.svg)](https://trending-repos.com/repositories/Mininglamp-AI/cider)

HTML

<a href="https://trending-repos.com/repositories/Mininglamp-AI/cider"><img src="https://trending-repos.com/badge/Mininglamp-AI/cider.svg" alt="Trending Repos" /></a>

Raw image URL

https://trending-repos.com/badge/Mininglamp-AI/cider.svg

Back to trending