enforce-eager

2025-04-01 10:01:34

拼音 [ 拼音 ]

enforce eager mode with bnb quantization temporarily (#6846...

A high-throughput and memory-efficient inference and serving engine for LLMs - enforce eager mode with bnb quantization temporarily (#6846) · bong-furiosa/vllm-bong@bb54946
Temporarily enforce eager mode for GPTQ models (#2154) · 0...

if self.quantization == "gptq" and not self.enforce_eager: # Related issue: https://github.com/vllm-project/vllm/issues/2147 logger.warning("GPTQ does not support CUDA graph yet. Disabling " "CUDA graph.") self.enforce_eager = Truedef verify_with_parallel_config( self,0...
[Bugfix] Set enforce_eager automatically for mllama by heheda...

[Bugfix] Set enforce_eager automatically for mllama 👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only runfastcheckCI which starts running only a small and essential subset of CI tests to quickly ...
enforce eager mode with bnb quantization temporarily by chen...

enforce eager mode with bnb quantization temporarily 👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only runfastcheckCI which consists a small and essential subset of CI tests to quickly catch errors....
enforce eager mode with bnb quantization temporarily by chen...

Temporarily enforce eager_mode in bitsandbytes quantization before the known issue (#5569) is fixed.