inload_model model = load_quantized(model_name) File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py”, line 151,inload_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, ...
A good small model for testing is GALACTICA 125M loaded through the transformers loader. This doesn't work, it checks if CUDA is available and then uses the CPU, rather than trying the extension. Also, a good idea would be to call "source /opt/intel/oneapi/setvars.sh" from the script...
_model_loader: - kv 9: gemma2.attention.key_length u32 = 256 llama_model_loader: - kv 10: gemma2.attention.value_length u32 = 256 llama_model_loader: - kv 11: general.file_type u32 = 18 llama_model_loader: - kv 12: gemma2.attn_logit_softcapping f32 = 50.000000 llama_model_...
line 302, in <module> shared.model, shared.tokenizer = load_model(shared.model_name) File "C:\Users\geckocakes\Downloads\oobabooga-windows\text-generation-webui\modules\models.py", line 102, in load_model model = load_quantized(model_name) File "C:...
21:21:02-869971 ERROR Failed to load the model. Traceback (most recent call last): File "K:\text-generation-webui-main\modules\ui_model_menu.py", line 231, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) ...