audioopen-sourcedataopensourceopendatacorpusopen-datadatasetaudio-datadatasetsrussian-datasetsaudio-datasetschinese-datasetvoice-datasetvoice-datasetsaudio-datasetvoice-datasova-datasetenglish-datasets UpdatedNov 8, 2022 汉字数据集,包括汉字的相关信息,例如笔画数、部首、拼音、英文释义/同义词等。
[3]TabFact: A Large-scale Dataset for Table-based Fact Verification. ICLR 2020 [4]MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. EMNLP 2019 [5]A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking. CoNLL 2019 [6]X-Fact: A New ...
PsyQA: A Chinese Dataset for Generating Long Counseling Text for Mental Health Support 你过来呀 呀~~呀~呀8 人赞同了该文章 目录 收起 论文速读 1 综述 咨询中的语言行为 用于心理健康检测和治疗的 NLP 基于文本的心理健康相关数据集 4 数据收集 数据源 数据清理 策略标注 注释质量控制 语料分析...
Datasetfilenotes alpaca-chinesealpaca-chinese-52k.json包含了52k英文和中文的数据全集 alpaca-chinese./data/alpaca_chinese_part*.json分拆数据文件 Case1成语:有一些sample,直译后需要进行二次改写,例如成语类的 {"en_instruction":"What is the meaning of the following idiom?","instruction":"以下成语是什么...
Chinese_dataset5w.zip Chinese_dataset5w.zip (144.34M) 下载 File Name Size Update Time Chinese_dataset5w_rec_test_win.txt 178792 2023-03-28 21:29:10 Chinese_dataset5w_rec_test.txt 168792 2023-03-28 21:29:10 Chinese_dataset5w.txt 4203619 2023-03-28 21:29:10 Chinese_dataset5w/img_000000...
A large collection of dialogues between patients and doctors must be annotated for medical named entities to build intelligence for telemedicine. However, since most patients involved in telemedicine deliver related named entities in informal and long mu
To promote research in Chinese information extraction and evaluate the performance of related systems, we build a large-scale high-quality dataset, named DuIE, and make it publicly available. We design an efficient coarse-to-fine procedure including candidate generation and crowdsourcing annotation, in...
https://github.com/JiangYanting/Chinese_book_dataset Chinese_book_dataset 中文图书分类数据集/自然语言处理/中国图书分类法/图书情报学/数据挖掘/文本分类/ 若在科研论文、项目工程中使用了该数据集,欢迎引用: 蒋彦廷,胡韧奋. 基于BERT模型的图书表示学习与多标签分类研究[J]. 新世纪图书馆(图书馆学情报学CSSCI...
Web dataset:为了收集网络数据集,我们使用了MTWI [30],其中包含来自淘宝网站的17个不同类别、20,000张中英文网络文本图像。这些文本样本出现在各种场景、排版和设计中。我们从训练集中获取了140,589张文本图像,并手动按照8:1:1的比例划分,得到了112,471个样本用于训练,14,059个样本用于验证,以及14,059个样本用于...
3. Re:【论文阅读】DocRED: A Large-Scale Document-Level Relation Extraction Dataset[ACL2019] 请问作者知道统计中#Inst 和 #Fact的区别吗,不太懂关系实例和关系事实有啥区别,还是这俩代表别的意思 --Kiruti 4. Re:【代码精读】DocRED: A Large-Scale Document-Level Relation Extraction Dataset(2) @嗒嗒的...