GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.
.github bzl_def cmake core cpp docker docs docsrc examples notebooks packaging py tests third_party toolchains tools .bazelrc .bazelversion .clang-format .clang-tidy .dockerignore .git-blame-ignore-revs .gitignore .gitmodules .pre-commit-config.yaml ...
.github 3rdparty benchmarks cpp docker docs examples jenkins scripts tensorrt_llm tests .clang-format .clang-tidy .clangd .cursorignore .dockerignore .gitattributes .gitignore .gitmodules .pre-commit-config.yaml CODE_OF_CONDUCT.md CODING_GUIDELINES.md ...
TensorRT-LLM(8)--数值精度(github翻译) HelloGPT 计算机虚拟现实 4 人赞同了该文章 目录 收起 1、FP32、FP16 和 BF16 2、量化和反量化 (Q/DQ) QuantizerPerToken类 3、INT8 SmoothQuant (W8A8) 4、INT4 和 INT8 仅重量(W4A16 和 W8A16) ...
1.多头、多查询、多组注意力本文详细介绍了在TensorRT-LLM中为GPT类模型的自回归模型实现多头注意力(MHA)、多查询注意力(MQA)和组查询注意力(GQA)。 多头注意力是按照注意力是一个批处理matmul、一个softmax…
Name Last commit message Last commit date Latest commit Cannot retrieve latest commit at this time. History 311 Commits .github/ISSUE_TEMPLATE ONNX-TensorRT 21.02 release (#631) Jan 23, 2021 docs ONNX-TensorRT 10.9-GA Release (#1022)
新鲜开源: TensorRT-LLM 开源啦,GitHub地址: https://github.com/NVIDIA/TensorRT-LLM Key FeaturesTensorRT-LLM contains examples that implement the following features. Multi-head Attention(MHA)Multi-q…
https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/#TensorRT官方文档(C++ api)https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/#TensorRT官方文档(python api)https://github.com/NVIDIA/trt-samples-for-hackathon-cn/tree/master/cookbook https://github.com/wang-xinyu/tensorrtx ...
The Triton TensorRT-LLM Backend. Contribute to triton-inference-server/tensorrtllm_backend development by creating an account on GitHub.
Logger:用于记录错误、警告和信息性消息。TensorRT提供了一个基本的Logger实现,但你也可以通过从tensorrt.ILogger派生来编写自己的实现,以获得更高级的功能。 Parsers:用于从已训练的模型中解析网络定义。 Network:代表一个计算图,可以通过Parsers或Network API手动填充。