What is the issue? When I set the OLLAMA_NUM_PARALLEL=3 environment parameter, I found an exception on multi-threaded requests in a single model, as shown in the figure. At the same time, I also found abnormal output in the log, is this a model's problem or a problem of multi-thr...
It would be great if you could set OLLAMA_NUM_PARALLEL per model. Example use case: You have one large "smart" model you only ever want one request at a time going to to avoid using all your memory. You have a smaller "fast" fast model (or just one with a smaller context) that...
Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. - server: fix model reloads when setting `OLLAMA_NUM_PARALLEL` by jmorganca · Pull Request #5560 · ollama/ollama