vllm

Organization

by vllm-project

A high-throughput and memory-efficient inference and serving engine for LLMs

View on GitHub

#amd #blackwell #cuda #deepseek #deepseek-v3 #gpt #gpt-oss #inference+12 more

81.0k

Stars

17.2k

Forks

81.0k

Watchers

5.0k

Issues

Repository Details

Created

Feb 9, 2023

Last Updated

May 26, 2026

Primary Language

Python

License

Repository Size

211.5k KB

Quick Actions

Open in GitHub

Project Overview

About this repository

A high-throughput and memory-efficient inference and serving engine for LLMs

Technologies & Topics

amd blackwell cuda deepseek deepseek-v3 gpt gpt-oss inference kimi llama llm llm-serving model-serving moe openai pytorch qwen qwen3 tpu transformer

Default Branch

main

README

vllm

Organization

by vllm-project

A high-throughput and memory-efficient inference and serving engine for LLMs

View on GitHub

#amd #blackwell #cuda #deepseek #deepseek-v3 #gpt #gpt-oss #inference+12 more

81.0k

Stars

17.2k

Forks

81.0k

Watchers

5.0k

Issues

Repository Details

Created

Feb 9, 2023

Last Updated

May 26, 2026

Primary Language

Python

License

Repository Size

211.5k KB

Quick Actions

Open in GitHub

Project Overview

About this repository

A high-throughput and memory-efficient inference and serving engine for LLMs

Technologies & Topics

amd blackwell cuda deepseek deepseek-v3 gpt gpt-oss inference kimi llama llm llm-serving model-serving moe openai pytorch qwen qwen3 tpu transformer

Default Branch

main

README

Command Palette

vllm

About this repository

Technologies & Topics

Default Branch

vllm

About this repository

Technologies & Topics

Default Branch