确保你的CUDA版本与GPU架构兼容。可以通过NVIDIA官方网站查询特定CUDA版本支持的GPU架构列表。 使用命令 nvcc --version 检查当前安装的CUDA版本。 检查PyTorch或相关深度学习框架的CUDA支持情况: 查阅PyTorch或其他深度学习框架的官方文档,确认它们支持你当前安装的CUDA版本。 如果框架不支持你的CUDA版本,你可能需要升级或...
Originally posted by @carterbox in #1 (comment) In parallel, if you feel there are significant performance to be gained by building for '8.0,9.0+PTX' or even more archs, then please start doing the steps in this checklist in order to get...
weiji14changed the titleflash-attn v2.6.3 + TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTXOct 13, 2024 weiji14mentioned this pull requestOct 13, 2024 Feature Request: Add multiple outputs for fused_dense_lib and layer_norm#18 Closed
nvcc -gencode arch=compute_52,code=compute_52 -gencode arch=compute_120,code=sm_120 main.cu -o main 这不仅兼容Blackwell,还能为未来的GPU铺路。 数学库也要跟上步伐 cuDNN 9+:基于CUDA 12构建,已有硬件前向兼容性,但想用满新架构的Tensor Core,还是得升级。cuBLAS和cuFFT:自带PTX,理论上在新GPU上能...
Hi, I am struggle in building pointops recently. I meet the problems as followed Traceback (most recent call last): File "setup.py", line 12, in setup( File "/home/zw/anaconda3/envs/sam3d/lib/python3.8/site-packages/setuptools/init.py", ...