float repeat_penalty; int repeat_last_n; int max_new_tokens; std::string generation_mode; std::string input_mode; @@ -163,7 +166,8 @@ void Qwen::load_tiktoken(std::string tokenizer_path) { void Qwen::init(const std::vector<int> &devices, std::string model_path, std::string to...
"parse_special_tokens" : true, "mirostat" : 0, "temp" : 0.89999997615814209, "repeat_penalty" : 1.1000000238418579, "repeat_penalty" : 1, "reverse_prompt" : "USER:", "mmap" : true, "add_bos_token" : false, 13 changes: 0 additions & 13 deletions 13 LLMFarm/model_setting_templates...
That said, I would use the base search, then create full panels for each of the search "options", set basic true/false tokens (instead of changing the search), and then hide or show whichever panel is being requested. Cheers,JacobIf you feel this response answered your question, please ...
, messages=messages, temperature=0.1, top_p=0.9, max_tokens=4096, tools=[], extra_body={ "repetition_penalty": 1.05, }, ) req_id = completion.id total_token = completion.usage.total_tokens completion_token = completion.usage.completion_tokens prompt_tokens = completion.usage.prompt_tokens ...
Increase reptition_penalty value Increase top_k valueCan you help me to sovle this problem? These are all input args that you can modify. do_sample = True is a args of the API. For example: outputs = model.generate(inputs, streamer=streamer, max_new_tokens=30, do_sample=True) Please...