TensorRT-LLM

Organization

by NVIDIA

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

View on GitHub

#blackwell #cuda #llm-serving #moe #pytorch

13.7k

Stars

2.4k

Forks

13.7k

Watchers

1.3k

Issues

Repository Details

Created

Aug 16, 2023

Last Updated

May 26, 2026

Primary Language

Python

License

Repository Size

1.9M KB

Quick Actions

Open in GitHub

Project Overview

About this repository

Technologies & Topics

blackwell cuda llm-serving moe pytorch

Default Branch

main

README

TensorRT-LLM

Organization

by NVIDIA

View on GitHub

#blackwell #cuda #llm-serving #moe #pytorch

13.7k

Stars

2.4k

Forks

13.7k

Watchers

1.3k

Issues

Repository Details

Created

Aug 16, 2023

Last Updated

May 26, 2026

Primary Language

Python

License

Repository Size

1.9M KB

Quick Actions

Open in GitHub

Project Overview

About this repository

Technologies & Topics

blackwell cuda llm-serving moe pytorch

Default Branch

main

README

Command Palette

TensorRT-LLM

About this repository

Technologies & Topics

Default Branch

TensorRT-LLM

About this repository

Technologies & Topics

Default Branch