vllm

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

github AI Tools Python free
★ 76,801Stars
15,657Forks
76,801Watchers
4Views
Apr 2026Last Update