上一篇 南朝四百八十寺:表征学习-类GPT生成式LLM模型做embedding里讨论了拿GPT的hidden layer做文本embedding的方法,其实在22年论文《SGPT: GPT Sentence Embeddings for Semantic Search》也探讨过这个问题,本…
如果想快速体验搜索的技术,或者Sentence Embedding的应用,推荐PaddleNLP的开源实现Pipelines,内置了RocketQA系列的模型,能够在不用训练的情况下搭建一个检索系统,还包括了后台和前端。 https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelinesgithub.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines 另外如果想...
首先是第一步对于问题Question Q,通过retriever encoder抽取向量,然后通过向量来检索documents(这些documents是通过一个初始化的retriever来得到的,然后就跟query embedding计算相似度就找到了topk的文档了),第二步,retriever的似然计算这是为了对检索出来的文档做规范化,本质上就是一个softmax的计算公式,把这个当成student ...
tensors="pt") return batch_tokens def get_weightedmean_embedding(batch_tokens, model): # Get the embeddings with torch.no_grad(): # Get hidden state of shape[bs, seq_len, hid_dim] last_hidden_state = model(**batch_tokens, output_hidden_states=True, return_dict=True).last_hidden...
SGPT: GPT Sentence Embeddings for Semantic Search information-retrieval retrieval gpt language-model semantic-search text-embedding sgpt sentence-embeddings neural-search large-language-models Updated Feb 17, 2024 Jupyter Notebook ContextualAI / gritlm Star 617 Code Issues Pull requests Generative ...
language-model semantic-search text-embedding sgpt sentence-embeddings neural-search large-language-models Resources Readme License MIT license Citation Citethis Activity 863 stars Watchers 8 watchingForks 54 forks Reportrepository ReleasesNo releasespublished PackagesNo ...