量化版本的运行条件: Name Quant method Bits Size Max RAM required Use case causallm_14b.Q4_0.gguf Q4_0 4 8.18 GB 10.68 GB legacy; small, very high quality loss - prefer using Q3_K_M causallm_14b.Q4_1.gguf Q4_1 4 9.01 GB 11.51 GB legacy; small, substantial quality loss - lpref...
量化版本的运行条件: NameQuantmethodBitsSizeMaxRAM requiredUsecase causallm_14b.Q4_0.gguf Q4_048.18GB10.68GB legacy;small,very high quality loss-preferusingQ3_K_M causallm_14b.Q4_1.gguf Q4_149.01GB11.51GB legacy;small,substantial quality loss-lpreferusingQ3_K_L causallm_14b.Q5_0.gguf Q5...
量化版本的运行条件: Name Quant methodBitsSize Max RAM required Use case causallm_14b.Q4_0.gguf Q4_048.18GB10.68GB legacy; small, very high quality loss - prefer using Q3_K_Mcausallm_14b.Q4_1.gguf Q4_149.01GB11.51GB legacy; small, substantial quality loss - lprefer using Q3_K_Lcausal...
llm = Llama( model_path="D:\Downloads\causallm_14b-dpo-alpha.Q3_K_M.gguf", chat_format="llama-2") res = llm.create_chat_completion( messages = [ {"role":"system","content":"You are a helpful assistant."}, {"role":"user","content":"来一段金瓶梅风格的情感小说,100字,别太露...
model_path="D:\Downloads\causallm_14b-dpo-alpha.Q3_K_M.gguf", chat_format="llama-2" ) res = llm.create_chat_completion( messages = [ {"role": "system", "content": "You are a helpful assistant."}, { "role": "user",