#getquantized llama-2curl-Lhttps://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_K_M.bin--output./models/llama-2-7b-chat.ggmlv3.q4_K_M.bin # build and run docker compose up--build 简单来说,该过程下载了量化的 Llama-2-7b-chat 模型,然后...
Generate a chat completion Create a Model List Local Models Show Model Information Copy a Model Delete a Model Pull a Model Push a Model Generate Embeddings List Running Models Conventions Model names Model names follow a model:tag format, where model can have an optional namespace such as exam...
Chat request (Reproducible outputs) Request Response Create a Model Parameters Examples Create a new model Request Response Check if a Blob Exists Query Parameters Examples Request Response Create a Blob Query Parameters Examples Request Response
同步操作将从Gitee 极速下载/ollama强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!! 确定后同步将在后台操作,完成时将刷新页面,请耐心等待。 删除在远程仓库中不存在的分支和标签 同步Wiki(当前仓库的 wiki 将会被覆盖!) 取消 ...
克隆/下载 git config --global user.name userName git config --global user.email userEmail api.md35.92 KB 一键复制编辑原始数据按行查看历史 Jeffrey Morgan提交于7个月前.Update api.md API Endpoints Generate a completion Generate a chat completion ...
Some examples are orca-mini:3b-q4_1 and llama2:70b. The tag is optional and, if not provided, will default to latest. The tag is used to identify a specific version. Durations All durations are returned in nanoseconds. Streaming responses Certain endpoints stream responses as JSON objects ...