Chase: A Large-Scale and Pragmatic Chinese Dataset for Cross-Database Context-Dependent Text-to-SQL Jiaqi Guo, Ziliang Si, Yu Wang, Qian Liu, Ming Fan, Jian-Guang Lou, Z. Yang, Ting Liu 2021 A Review of Cross-Domain Text-to-SQL Models ...
An Electroencephalography (EEG) dataset utilizing rich text stimuli can advance the understanding of how the brain encodes semantic information and contribute to semantic decoding in brain-computer interface (BCI). Addressing the scarcity of EEG datasets
To promote research in Chinese information extraction and evaluate the performance of related systems, we build a large-scale high-quality dataset, named DuIE, and make it publicly available. We design an efficient coarse-to-fine procedure including candidate generation and crowdsourcing annotation, in...
3. Re:【论文阅读】DocRED: A Large-Scale Document-Level Relation Extraction Dataset[ACL2019] 请问作者知道统计中#Inst 和 #Fact的区别吗,不太懂关系实例和关系事实有啥区别,还是这俩代表别的意思 --Kiruti 4. Re:【代码精读】DocRED: A Large-Scale Document-Level Relation Extraction Dataset(2) @嗒嗒的...
The dataset is collected by the University of San Diego. Github: https://github.com/UCSD-AI4H/Medical-Dialogue-System Chinese: 中文医疗对话数据集包含了110万条医患对话,该数据集来源于好大夫 (http://haodf.com/),时间跨度从2010年到2020年,由圣地亚哥大学收集整理。 Github: https://github.com...
In the present study, a Chinese Conceptual semantic Feature Dataset (CCFD) was established with 1,410 concepts including their semantic features and the similarity between concepts. The concepts were grouped into 28 subordinate categories and seven superior categories artificially. The results showed ...
Evaluation of Dataset for Different Models AFQMC 蚂蚁语义相似度 Ant Semantic Similarity (Accuracy): 模型开发集(dev)测试集(test)训练参数 ALBERT-xlarge--batch_size=16, length=128, epoch=3 ALBERT-tiny-69.92%batch_size=16, length=128, epoch=3 ...
secsilm/zi-dataset Star115 Code Issues Pull requests Discussions 汉字数据集,包括汉字的相关信息,例如笔画数、部首、拼音、英文释义/同义词等。 nlpdatasetchinese-nlphanzinlp-datasetschinese-dataset UpdatedJul 17, 2020 QBQTC: 大规模搜索匹配数据集
Datasetfilenotes alpaca-chinesealpaca-chinese-52k.json包含了52k英文和中文的数据全集 alpaca-chinese./data/alpaca_chinese_part*.json分拆数据文件 Case1成语:有一些sample,直译后需要进行二次改写,例如成语类的 { "en_instruction": "What is the meaning of the following idiom?", "instruction": "以下成语...
CDLA: A Chinese document layout analysis (CDLA) dataset 介绍CDLA是一个中文文档版面分析数据集,面向中文文献类(论文)场景。包含以下10个label:正文标题图片图片标题表格表格标题页眉页脚注释公式 Text Title Figure Figure caption Table Table caption Header Footer Reference Equation共包含5000张训练集和1000张验证...