此外,一些TEUTERS标记中还包含第六个属性CSECS,这个标记可以忽略。数据集的说明在附件里,至于如何处理,比较难,可以用java的正则表达式处理
数据摘要:This is a very often used test set for text categorisation tasks.中文关键词:数据挖掘,路透社,文本归类,文本分类,英文关键词:Data mining,Reuters,Text categorization,Text Classification,数据格式:TEXT 数据用途:The data can be used to data mining and analysis.数据详细介绍:The Reuters-21578...
Variousresearchershaveprepareddatafilesusefulforworkwith Reuters-21578.Contactmeifyouwouldlikemetohostsuchresourceshere; Iamhappytoiftheirdiskspacerequirementsaremodest.Currentlytheonly suchresourceavailablehereisaPROLOGfactbaseaboutcountries contributedbyRonenFeldman. 数据预览: 点此下载完整数据集相关...