guqiong96/Lvllm

LvLLM is a special NUMA extension of vllm that makes full use of CPU and memory resources, reduces GPU memory requirements, and features an efficient GPU parallel and NUMA parallel architecture, supporting hybrid inference for MOE large models.

Python370 stars33 forksApache-2.0

cpudecodegpuhybridinferencemodelmoenumaparallelismprefillvllm

Rankings

Daily

#5324

Weekly

#9623

Monthly

#13204

Engagement

Total stars

Daily new stars

Get badge

Add a Trending Repos badge to your README

This repository is tracked by Trending Repos. The badge upgrades automatically the moment it cracks the top 100 — set it once and forget it.

Markdown — paste into your README

[![Trending Repos](https://trending-repos.com/badge/guqiong96/Lvllm.svg)](https://trending-repos.com/repositories/guqiong96/Lvllm)

HTML

<a href="https://trending-repos.com/repositories/guqiong96/Lvllm"><img src="https://trending-repos.com/badge/guqiong96/Lvllm.svg" alt="Trending Repos" /></a>

Raw image URL

https://trending-repos.com/badge/guqiong96/Lvllm.svg

Back to trending