stsb-roberta-large:针对语义文本相似度(STS)任务进行了强化,适用于需要高精度相似度计算的场景。 句子嵌入 使用预训练模型生成句子嵌入非常直观: from sentence_transformers import SentenceTransformer # 加载预训练模型 model = SentenceTransformer('all-MiniLM-L6-v2')
pip install -U sentence-transformers pip install -U transformers 直接使用 Sentence-Transformer提供了非常多的预训练模型供我们使用,对于STS(Semantic Textual Similarity)任务来说,比较好的模型有以下几个 roberta-large-nli-stsb-mean-tokens - STSb performance: 86.39 roberta-base-nli-stsb-mean-tokens - STS...
cross-encoder/stsb-TinyBERT-L-4 - STSbenchmark test performance: 85.50 cross-encoder/stsb-distilroberta-base - STSbenchmark test performance: 87.92 cross-encoder/stsb-roberta-base - STSbenchmark test performance: 90.17 cross-encoder/stsb-roberta-large - STSbenchmark test performance: 91.47 ...
bert-base-nli-stsb-mean-tokens: Performance: STSbenchmark: 85.14 bert-large-nli-stsb-mean-tokens: Performance: STSbenchmark: 85.29 roberta-base-nli-stsb-mean-tokens: Performance: STSbenchmark: 85.44 roberta-large-nli-stsb-mean-tokens: Performance: STSbenchmark: 86.39 distilbert-base-nli-...
pip install -U transformers 直接使用 Sentence-Transformer 提供了非常多的预训练模型供我们使用,对于 STS(Semantic Textual Similarity)任务来说,比较好的模型有以下几个 roberta-large-nli-stsb-mean-tokens - STSb performance: 86.39 roberta-base-nli-stsb-mean-tokens - STSb performance: 85.44 bert-large...
importspacynlp=spacy.load('en_stsb_roberta_large') You can obtain the same result without having to install the standalone model, by using this method: importspacy_sentence_bertnlp=spacy_sentence_bert.load_model('en_stsb_roberta_large') ...
bert-large-nli-mean-tokens 79.19 87.78 bert-base-nli-stsb-mean-tokens 85.14 86.07 bert-large-nli-stsb-mean-tokens 85.29 86.66 roberta-base-nli-stsb-mean-tokens 85.44 - roberta-large-nli-stsb-mean-tokens 86.39 - distilbert-base-nli-stsb-mean-tokens 85.16 -Application...
model = SentenceTransformer('roberta-large-nli-stsb-mean-tokens') embeddings= model.encode(all_sentences)#all_sentences: ['评论1', '评论2', ...] 1%| | 10.2M/1.31G [01:36<3:25:46, 106kB/s] 使用科学.上网.后速度快多了 # 使用umap将sentence_transformers生成1024维降到2维 ...
https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark.py 二、代码 此示例从头开始为 STSbenchmark 训练 BERT(或任何其他转换器模型,如 RoBERTa、DistilBERT 等)。 它生成句子嵌入,可以使用余弦相似度进行比较以测量相似度。
roberta-large-nli-stsb-mean-tokens: Performance: STSbenchmark: 86.39 distilbert-base-nli-stsb-mean-tokens: Performance: STSbenchmark: 84.38 Trained on Quora Duplicate Question Detection These models were tuned to detect duplicate questions based on the Quora duplicate questions dataset. It can be...