Full error with the stack trace: Traceback (most recent call last): File "/data/users/aakhundov/pytorch/torch/testing/_internal/common_utils.py", line 2744, in wrapper method(*args, **kwargs) File "/data/users/aakhundov/pytorch/test/inductor/test_control_flow.py", line 326, in ...
from MPI rank 0 of 1 [1709638759.672065] [GPU-LNX-CLUSTER-01:1723628:0] cuda_copy_md.c:182 UCX ERROR cudaHostUnregister(address)() failed: pointer does not correspond to a registered memory region [1709638759.672078] [GPU-LNX-CLUSTER-01:1723628:0] ucp_mm.c:332 UCX WARN failed to dereg ...
CUDA runtime version: 11.5.119 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA RTX A5000 Nvidia driver version: 515.105.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A ...
I think the install of the modules might be failing because of the 11.8 version of CUDA. Does this work for more modern graphics cards? Traceback (most recent call last): File "D:\ComfyUI_windows_portable_nvidia_cu118_or_cpu\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1887, in ...
OSError: meta-llama/Meta-Llama-3-8B-Instruct does not appear to have a file named config.json. Checkout 'https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/None' for available files. Here's the whole trace mirdulvultr commented Apr 19, 2024 I tried 8b it worked perfectl...
🐛 Describe the bug Testing a variety of TP requires_grad patterns (validating maximally flexible finetuning) revealed DTensor sharding propagation of aten.native_layer_norm_backward (default) fails with the following IndexError (tracebac...
I train own data . My data format is voc and use this code voc_label.py for pytorch training ., but I get : RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity。 then: I modify utils/datasets.py line 53 ,‘im...
DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE I understand that I will be blocked if I intentionally remove or skip any mandatory* field Checklist I'm reporting a bug unrelated to a specific site I've verified that I'm running yt-dlp version ...
# https://github.com/NVIDIA/nccl (last tested version: v1.2.3-1+cuda8.0) # USE_NCCL := 1 # Uncomment to use `pkg-config` to specify OpenCV library paths. # (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.) ...