Which seems out-of-bounds indexing, and I found there is a specialmodeling_opt.pyundermodel/llm/opt. So what is the correct config of opt-125m?
File "/home/sdp/miniforge3/envs/pytorch-2.3.0+cpu-3.10-spr/lib/python3.10/site-packages/neural_compressor/torch/algorithms/weight_only/gptq.py", line 813, in fasterquant H = torch.linalg.cholesky(H) torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because th...
从git源码安装 git clone https://github.com/vllm-project/vllm.git cd vllm # export VLLM_INSTALL_PUNICA_KERNELS=1 # optionally build for multi-LoRA capability pip install -e . # This may take 5-10 minutes. 需要本地编译,适用于网络受限环境。 二、推理测试 2.1 离线批量推理 使用vLLM为一批...
For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/ Show more
To use torch.compile, we need to add self.model = torch.compile(self.model) in this line: https://github.com/vllm-project/vllm/blob/main/vllm/worker/model_runner.py#L253 . Currently, when I run it on H100 (with vllm 0.5.1), I get the following error: [rank0]: File "/tmp...
assign The following actions uses Node.js version which is deprecated and will be forced to run on node20: actions/github-script@v6. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/ Show more ...
assign The following actions uses Node.js version which is deprecated and will be forced to run on node20: actions/github-script@v6. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/ Show more ...