turbo = dspy.OllamaLocal(model="llama3") and I specified the embeddings model to be a HuggingFace model. LlamaIndex does use OpenAI embeddings as its default embeddings model if this specification isn't made. from llama_index.core import ( SimpleDirectoryReader, VectorStoreIndex, Settings ) ...
Usage: text-embeddings-router [OPTIONS] Options: --model-id <MODEL_ID> The name of the model to load. Can be a MODEL_ID as listed on <https://hf.co/models> like `thenlper/gte-base`. Or it can be a local directory containing the necessary files as saved by `save_pretrained(.....
In the model you are using, there is a folder 0_Transformer, which contains a tokenizer_config.json. Add in that json a new entry: "use_fast": false This should load the slower Python tokenizer. Author MyBruso commented Mar 5, 2021 Thank you @nreimers, I will try using "use_fast"...
@@ -208,6 +208,10 @@ def test_save_load_strict(self): def test_inputs_embeds(self): pass @unittest.skip("TODO: Decoder embeddings cannot be resized at the moment") def test_resize_embeddings_untied(self): pass @require_sentencepiece @require_tokenizers def test_tiny_model(self): 4 ...