不同于 DNABERT,DNABERT-2 没有采用 BERT 中的 Positional Embedding,因为这种位置编码方法由预训练学习得到,这意味着预训练的序列最大长度为多少,后续的应用中也只能局限于此,无法外推 (序列长度扩展后依然可以有适合的位置编码),限制了很多基因组学下游任务的应用。 因此,非训练类型的 Positional Embedding 其实在...
此外,通过分析DNA呼吸过程中的动态变化,EPBDxDNABERT-2模型还可以揭示基因表达调控的复杂机制,为疾病的预防和治疗提供新的策略。 3.3 未来基因组学的挑战与发展方向 尽管EPBDxDNABERT-2模型在解析DNA呼吸和基因表达调控方面取得了显著进展,但基因组学领域仍面临诸多挑战。首先,基因组数据的复杂性和多样性使得数据的整合...
别家的论文用dnabert,用dna序列训练bert模型,那么我们就用gpt训练一个gpt2模型啦,效果不管咋样,反正数据有了,下游评测也有了,换个瓶子装水就行。。。 先是数据准备,还是用人类基因组数据做训练语料,从头…
(dna) atrix@Atrix:/mnt/c/Users/adity/OneDrive/Desktop/dnabert2/DNABERT_2/finetune$ sh scripts/run_dnabert2_prom.sh /mnt/c/Users/adity/OneDrive/Desktop/dnabert2/data/balanced_data_prom_vaish/ WARNING:root:Perform single sequence classification... WARNING:root:Perform single sequence classification...
When I try "sh scripts/run_dnabert2.sh /home/DNABERT_2" I get the following error message: (base) root@f442fb5fbe89:/home/DNABERT_2/finetune# sh scripts/run_dnabert2.sh /home/DNABERT_2 The provided data_path is /home/DNABERT_2 Using the ...
train_dataset=tokenized_dataset["train"], eval_dataset=tokenized_dataset["test"], tokenizer=tokenizer, data_collator=data_collator, compute_metrics=compute_metrics, ) trainer.train() 训练结果如下: 可以看到,二分类的精度大致能到80%左右,和dnabert等模型(83%左右)差别不是很大。
edited jacobfulanochanged the titleCiting MosaicBERT architecture for DNABERT_2 Pretraining WorkJan 16, 2024 jacobfulanochanged the titleCiting MosaicBERT architecture and code for DNABERT_2 Pretraining WorkJan 16, 2024 jacobfulanoclosed this ascompletedJan 16, 2024 ...
If someone really wants to give it a go, this might be promising to use instead. But currently, implementation is a bit more involved than just switching out a few lines on DNABERT_2.Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment ...