The model changes with the endpoint:For chat and chat-completion, use gpt2. For embeddings, use intfloat/e5-mistral-7b-instruct \.When the server is running, you can run GenAI-Perf with the GenAI-Perf command to get the results. You see results visually, which are ...
MODEL="llama-3.1-8b-instruct" TOKENIZER="meta-llama/Meta-Llama-3.1-8B-Instruct" genai-perf profile \ --model ${MODEL} \ --tokenizer ${TOKENIZER} \ --service-kind openai \ --endpoint-type chat \ --url localhost:9000 \ --streaming Example output 2024-10-14 22:43 [INFO] genai...
A middleware to provide an openAI compatible endpoint that can call MCP tools - SecretiveShell/MCP-Bridge
removed merge queue forced run * addressed some PR comments * openai endpoint requires that system message is first if present * changed openai-compatible interface to take function_name in model field --- Co-authored-by: Gabriel Bianconi <1275491+GabrielBianconi@users.noreply.github.com>main (#...
To use litellm by OpenAI with a compatible endpoint, you need to integrate the API with your application, enabling natural language processing capabilities. For editing short videos, you can use the capcuttemplatefeature to streamline the video editing process, making it easier to apply templates ...
This PR enables llama-vscode to talk to a OpenAI compatible endpoint in lieu of llama.cpp running locally. Tested with OpenAI's exposed enpoint as well as a local vLLM server endpoint. I'm not sure if I translated all the llama.cpp arguments correctly for an OpenAI API so I imagine we...
### OpenAI Compatible Endpoint ```shell export OPENAI_API_KEY="<your api key>" # If <your-api-base> requires an API KEY, set this value. cover-agent \ ... --model "openai/<your model name>" \ --api-base "<your-api-base>" ``` ## Development This section discusses the develo...
Open opened this issueSep 29, 2024· 2 comments matbee-ethcommentedSep 29, 2024• edited The current implementation of local means no sharding/tensor parallelism, etc, and refuses to work on my dual 4090 setup. How do I enable multi gpu, or how do I enable a proper system like VLLM...
Hi @random-forests @markmcd, Adding a tutorial to show how to call Gemini through an OpenAI-Compatible Endpoint (via LiteLLM) This supports Google AI Studio + Vertex AI as well. Let me know if there's anything further required here!
Closes #592 Please describe the purpose of this pull request. During memgpt configure if using an openai endpoint, pull the list of models Also, allow users to choose a "enter yourself" custom opt...