Using tfjs-node-gpu on a GKE cluster running on an n1-higmem-8 with an NVIDIA P4 or V100 GPU fails when the cuda_malloc_async allocater is set using TF_GPU_ALLOCATOR. System information Have I written custom code (as opposed to using a s...
If the cause is memory fragmentation maybe the environment variable'TF_GPU_ALLOCATOR=cuda_malloc_async'will improve the situation. Current allocation summary follows. Current allocation summary follows. 2024-07-12 08:21:05.997054: I external/local_tsl/tsl/framework/bfc_allocator.cc:1039] BFCAllocato...
"cudaMalloc failed: " + std::string(cudaGetErrorString(err))) .c_str()); } break; } #endif // TRITON_ENABLE_GPU // Use CPU memory if the requested memory type is unknown // (default case). case TRITONSERVER_MEMORY_CPU: default: { *actual_memory_type = TRITONSERVER...