📒CUDA-Learn-Notes: CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot、elementwise、softmax、layernorm、rmsnorm、histogram、relu、sigmoid etc.想要我的财宝吗?想要的话可以全部给
Use saved searches to filter your results more quickly Cancel Create saved search Sign in Sign up Reseting focus {{ message }} GithubX-F / CUDA-Learn-Notes Public forked from xlite-dev/CUDA-Learn-Notes Notifications You must be signed in to change notification settings Fork 0 ...
CUDA Documentation and Release Notes Documentation library containing in-depth technical information on the CUDA Toolkit. Learn more CUDA 12 Features Revealed A technical blog on the CUDA Toolkit 12.0’s features and capabilities. Learn more
CUDA-Learn-Notes将kernels按照主题进行划分,并且对每个主题的kernel实现,都进行了PyTorch python binding...
vLLM源码之PagedAttention(施工中)245 赞同 · 6 评论文章那就推荐一下我自己整理的CUDA-Learn-Notes...
https://github.com/DefTruth/CUDA-Learn-Notes 写AI算子的话,推荐 激活函数(element-wise)算子 -> softmax/normalization算子 -> 矩阵乘gemm(conv)这样逐步增加难度去学习;即按照 简单加减乘除运算 -> reduce运算 -> 矩阵乘运算 这样逐步增加难度,并且可优化空间也会变得更大。 2. 强烈推荐学习使用性能分析工...
We use Google C++ Style Guide for all the sourceshttps://google.github.io/styleguide/cppguide.html Frequently Asked Questions Answers to frequently asked questions about CUDA can be found athttp://developer.nvidia.com/cuda-faqand in theCUDA Toolkit Release Notes. ...
Learn what's new in the CUDA Toolkit, including the latest and greatest features in the CUDA language, compiler, libraries, and tools—and get a sneak peek at what's coming up over the next year. Watch Now CUDA on NVIDIA Hopper GPU Architecture ...
Learn what's new in the CUDA Toolkit, including the latest and greatest features in the CUDA language, compiler, libraries, and tools—and get a sneak peek at what's coming up over the next year. Watch Now CUDA on NVIDIA Hopper GPU Architecture ...
Learn more Resources CUDA Documentation and Release Notes Documentation library containing in-depth technical information on the CUDA Toolkit. Learn more CUDA 12 Features Revealed A technical blog on the CUDA Toolkit 12.0’s features and capabilities. ...