Remove gfx900 and gfx906 archs as they're long-in-the-tooth. Should help reduce the increasing size of ROCm binaries. cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongx...
GPU-be60788172fd5d3e Marketing Name: AMD Radeon VII Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1...
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): 2xAMD MI60, 1xRTX 3090 (I used only 1xMI60 for batching) How you installed MLC-LLM (conda, source): python -m pip install --pre -U -fhttps://mlc.ai/wheelsmlc-llm-nightly-rocm62 mlc-ai-nightly-rocm62 How you installed TVM...
1,2,3" TORCH_BLAS_PREFER_HIPBLASLT=0 PYTORCH_ROCM_ARCH=gfx906 OMP_NUM_THREADS=4 vllm serve "kaitchup/Llama-3.3-70B-Instruct-AutoRound-GPTQ-4bit" --port 8001 --tensor-parallel-size 4 --num-gpu-blocks-override 14430 --max-model-len 4096 #8x PYTHONPATH=/home/$USER/triton-gcn5/pyth...