Qwen1.5-7B-Chat-GPTQ-Int4需要在config.json中的"quantization_config"下的"exllama_config",加入"disable_exllama": true才不会报错: { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151643, "hidden_act": "silu", "hidden_...
通常,该参数的值可以是“true”或“false”,表示是否禁用感叹号。 3.根据应用程序的需求和应用场景,合理设置DisableExllama参数的值。如果设置为“true”,则表示禁用感叹号;如果设置为“false”或默认值,则表示允许用户输入包含感叹号的字符串。 4.保存配置文件或进行相应的设置后,重新启动应用程序,以确保参数生效。
config = transformers.AutoConfig.from_pretrained( model_args.model_name_or_path, cache_dir=training_args.cache_dir, trust_remote_code=True, ) config.use_cache = False config.quantization_config.use_exllama = False config.quantization_config.disable_exllama = True # Load model and tokenizer mod...
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by settingdisable_exllama=Truein the quantization config objec Hi, may I ask how do you load the model, in my case with single GPU I also had that probl...
args.disable_exllama: 224 + if shared.args.disable_exllama or shared.args.disable_exllamav2: 225 225 try: 226 - gptq_config = GPTQConfig(bits=config.quantization_config.get('bits', 4), disable_exllama=True) 226 + gptq_config = GPTQConfig( 227 + bits=config.quantization_...
@@ -685,6 +685,13 @@ gpt_params_context gpt_params_parser_init(gpt_params & params, llama_example ex, 685 685 params.n_keep = value; 686 686 } 687 687 )); 688 + add_opt(llama_arg( 689 + {"--no-context-shift"}, 690 + format("disables context shift on inifinite te...